Voice cloning references
Ethics and governance:
- Attard-Frost, B., De los Ríos, A., & Walters, D. R. (2022). “The ethics of AI business practices: A review of 47 AI ethics guidelines”. AI and Ethics, 1-18.
- DeepMind. (2024). “AI summit policies”. Retrieved June 27, 2024, from https://deepmind.google/public-policy/ai-summit-policies/
- Encode Justice, & Future of Life Institute. (2023, October 25). “AI policy for a better future: On addressing both present harms and emerging threats”. Future of Life Institute. Retrieved June 26, 2024, from https://futureoflife.org/open-letter/ai-policy-for-a-better-future-on-addressing-both-present-harms-and-emerging-threats/
- Floridi, L. (2019). “Translating principles into practices of digital ethics: Five risks of being unethical”. Philosophy & Technology, 32(2), 185–193.
- GOV.UK. (2024, January 17). “Introducing the AI Safety Institute”. Department for Science, Innovation & Technology. https://www.gov.uk/government/publications/ai-safety-institute-overview/introducing-the-ai-safety-institute (Retrieved June 10, 2024)
- Hagendorff, T. (2020). “The ethics of AI ethics: An evaluation of guidelines”. Minds and Machines, 30(1), 99–120.
- Jobin, A., Ienca, M., & Vayena, E. (2019). “The global landscape of AI ethics guidelines”. Nature Machine Intelligence, 1(9), 389–399.
- Khan, A. A., Badshah, S., Liang, P., Waseem, M., Khan, B., Ahmad, A., Fahmideh, M., Niazi, M., & Akbar, M. A. (2022). “Ethics of AI: A Systematic Literature Review of Principles and Challenges”. In Proceedings of the International Conference on Evaluation and Assessment in Software Engineering 2022, EASE ‘22 (pp. 383–392). Association for Computing Machinery.
- Right to Warn. (2024, June 4). “A Right to Warn about Advanced Artificial Intelligence”. Retrieved June 27, 2024, from https://righttowarn.ai/
Impacts and risks:
- BBC. (2023a, October 1). “Fake audio imitates Omar al-Bashir”. https://www.bbc.co.uk/news/world-africa-66987869
- BBC. (2023b, November 7). “Fake audio of Mayor of London”. https://www.bbc.co.uk/news/uk-england-london-67389609
- Burke, A. (2022, July). “Voice Cloning at Scale”. CETaS Expert Analysis. Retrieved June 20, 2024, from https://cetas.turing.ac.uk/publications/voice-cloning-scale
- CNN. (2023, April 29). “US mother threatened with kidnapping scam using AI voice”. https://edition.cnn.com/2023/04/29/us/ai-scam-calls-kidnapping-cec/index.html
- El Pais. (2023, February 22). “AI-generated propaganda in Venezuela”. https://english.elpais.com/international/2023-02-22/theyre-not-tv-anchors-theyre-avatars-how-venezuela-is-using-ai-generated-propaganda.html
- Express. (2023, May 17). “TikTok AI videos of murdered children spark outrage”. https://www.express.co.uk/news/uk/1771516/tiktok-ai-videos-murdered-children
- Forbes. (2021, October 14). “Bank fraud in Hong Kong”. https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/
- Independent. (2023, October 6). “Robin Williams’ daughter Zelda condemns AI recreation of his voice”. https://www.independent.co.uk/arts-entertainment/films/news/robin-williams-ai-voice-daughter-b2422506.html
- Kramer, M. (2024, February 26). “Steve Kramer explains why he used AI to impersonate President Biden in New Hampshire”. CBS New York. Retrieved June 8, 2024, from https://www.cbsnews.com/article/steve-kramer-explains-why-he-used-ai-to-impersonate-president-biden-in-new-hampshire
- Leader-Post. (2023, April 29). “Regina couple nearly scammed by AI voice cloning”. https://leaderpost.com/news/local-news/regina-couple-says-possible-ai-voice-scam-nearly-cost-them-9400
- Mike Cooper. (2023, March 11). “Voice over artist Mike Cooper’s voice used without consent”. https://www.mikecoopervoiceover.com/behind-the-mike/2023/3/11/send-in-the-clones
- Puig, A. (2023, March 20). “Scammers use AI to enhance their family emergency schemes”. Federal Trade Commission. Retrieved June 8, 2024, from https://www.ftc.gov/news-events/press-releases/2023/03/scammers-use-ai-enhance-their-family-emergency-schemes
- Reuters. (2023, October 12). “Fact check: Video does not show Joe Biden making transphobic remarks”. https://www.reuters.com/article/factcheck-biden-transphobic-remarks/fact-check-video-does-not-show-joe-biden-making-transphobic-remarks-idUSL1N34Q1IW
- Rose, J. (2024, June 25). “AI tools make it easy to clone someone’s voice without consent”. Proof. FotografiaBasica. https://www.proofnews.com/article123456
- Sky News. (2023, October 28). “New York’s mayor uses audio deepfakes”. https://news.sky.com/story/new-yorks-mayor-uses-audio-deepfakes-to-call-residents-in-languages-he-doesnt-speak-12986816
- The Guardian. (2023, March 16). “Vulnerability of Australian Bank & Tax Office”. https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai
- The Verge. (2020, April 28). “Jay-Z deepfakes on YouTube”. https://www.theverge.com/2020/4/28/21240488/jay-z-deepfakes-roc-nation-youtube-removed-ai-copyright-impersonation
- Variety. (2023, April 17). “Fake AI-generated Drake and Weeknd collaboration”. https://variety.com/2023/music/news/fake-ai-generated-drake-weeknd-collaboration-heart-on-my-sleeve-1235585451/
- Vice. (2023a, May 10). “Voice actors doxxed with AI voices on Twitter”. https://www.vice.com/en/article/93axnd/voice-actors-doxed-with-ai-voices-on-twitter
- Vice. (2023b, April 28). “AI voice firm ElevenLabs under fire as 4chan users clone celebrity voices”. https://www.vice.com/en/article/dy7mww/ai-voice-firm-4chan-celebrity-voices-emma-watson-joe-rogan-elevenlabs
- Vice. (2023c, February 3). “A scammer is pretending to be Andrew Tate on TikTok and racking up millions of views”. https://www.vice.com/en/article/5d3n8z/a-scammer-is-pretending-to-be-andrew-tate-on-tiktok-and-racking-up-millions-of-views
- Vice. (2023d, March 17). “Not Jordan Peterson: Voice generator shut down after deepfakes controversy”. https://www.vice.com/en/article/43kwgb/not-jordan-peterson-voice-generator-shut-down-deepfakes
- Vice. (2023e, July 22). “AI-generated swatting calls wreak havoc across the US”. https://www.vice.com/en/article/k7z8be/torswats-computer-generated-ai-voice-swatting
- Wired. (2023, September 30). “Slovakia’s election deepfakes”. https://www.wired.com/story/slovakias-election-deepfakes-show-ai-is-a-danger-to-democracy/
- Yomiuri Shimbun. (2023, November 4). “Fake audio of Japan’s President making vulgar statements sparks controversy”. https://japannews.yomiuri.co.jp/politics/politics-government/20231104-147695/
Technology and capabilities of voice cloning
- Amazon Web Services. (2024). Amazon Polly Developer Guide. Retrieved July 1, 2024, from https://docs.aws.amazon.com/pdfs/polly/latest/dg/polly-dg.pdf#what-is
- Arık, S. O., Chrzanowski, M., Coates, A., et al. (2018). “Deep voice: Real-time neural text-to-speech”. In Proceedings of the International Conference on Machine Learning, 195-204.
- Bonada, J., & Serra, X. (2007). “Synthesis of the singing voice by performance sampling and spectral models”. IEEE Signal Processing Magazine, 24, 69-79.
- Cambre, J., & Kulkarni, C. (2019). “One voice fits all? Social implications and research challenges of designing voices for smart devices”. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW). https://doi.org/10.1145/3359325
- Casanova, E., Weber, J., Shulby, C., Candido Junior, A., Gölge, E., & Ponti, M. A. (2021). “YourTTS: Towards zero-shot multi-speaker TTS and zero-shot voice conversion for everyone”. arXiv. http://arxiv.org/abs/2112.02418
- Dautricourt, R. (2017, November 9). “Modify the timbre of Amazon Polly voices with the new vocal tract SSML feature”. AWS Machine Learning Blog. https://aws.amazon.com/it/blogs/machine-learning/modify-the-timbre-of-amazon-polly-voices-with-the-new-vocal-tract-ssml-feature/
- ElevenLabs Team. (2024, January 22). “What is Voice Cloning?” ElevenLabs. Retrieved June 20, 2024, from https://elevenlabs.io/blog/what-is-voice-cloning/
- Espinosa, M. N. (2023, March 21). “State of the art in voice cloning: A review”. Marvik AI Blog. Retrieved June 21, 2024, from https://blog.marvik.ai/2023/03/21/state-of-the-art-in-voice-cloning-a-review/
- Institute for Natural Language Processing (IMS). (n.d.). “IMS Toucan”. GitHub. Retrieved June 21, 2024, from https://github.com/digitalphonetics/ims-toucan
- Jia, Y., Zhang, Y., Weiss, R., et al. (2018). “Transfer learning from speaker verification to multispeaker text-to-speech synthesis”. Advances in Neural Information Processing Systems, 31.
- Kenmochi, H., & Ohshita, H. (2007). “Vocaloid-commercial singing synthesizer based on sample concatenation”. In Eighth Annual Conference of the International Speech Communication Association.
- Li, J., & Zhang, L. (2023). “ZSE-VITS: A Zero-Shot Expressive Voice Cloning Method Based on VITS”. Electronics, 12(4), 820. https://doi.org/10.3390/electronics12040820
- Lu, P., Wu, J., Luan, J., Tan, X., & Zhou, L. (2020). “XiaoiceSing: A high-quality and integrated singing voice synthesis system”. arXiv preprint arXiv:2006.06261.
- Luong, H. T., & Yamagishi, J. (2020). “NAUTILUS: A versatile voice cloning system”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2967-2981.
- Lux, F., Koch, J., & Vu, N. T. (2022). “Low-Resource Multilingual and Zero-Shot Multispeaker TTS”. arXiv. https://doi.org/10.48550/arXiv.2210.12223
- Ning, Y., He, S., Wu, Z., et al. (2019). “A review of deep learning based speech synthesis”. Applied Sciences, 9(19), 4050.
- Tan, X., Qin, T., Soong, F., & Liu, T.-Y. (2021). “A survey on neural speech synthesis”. arXiv. http://arxiv.org/abs/2106.15561
- Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., Chen, Z., Liu, Y., Wang, H., Li, J., He, L., Zhao, S., & Wei, F. (2023). “Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers”. arXiv. https://doi.org/10.48550/arXiv.2301.02111
- Wu, Y., Zhao, H., Liang, X., & Sun, Y. (2023). “VStyclone: Real-time Chinese voice style clone”. Computers and Electrical Engineering, 105, Article 108534.
- Zhang, Y., Weiss, R. J., Zen, H., et al. (2019). “Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning”.
Voice cloning in education and singing synthesis:
- Algabri, H. K., Kharade, K. G., & Kamat, R. K. (2021). “Promise, threats, and personalization in higher education with artificial intelligence”. Webology, 18(6), Article 2129. http://www.webology.org
- Angrick, M., Herff, C., Mugler, E., et al. (2019). “Speech synthesis from ECoG using densely connected 3D convolutional neural networks”. Journal of Neural Engineering, 16(3), 036019.
- Anumanchipalli, G. K., Chartier, J., & Chang, E. F. (2019). “Speech synthesis from neural decoding of spoken sentences”. Nature, 568(7753), 493-498.
- Bielievtsov, D. (2024, April 26). “Elevate Language Education with Custom AI Voices: A Developer’s Guide”. Respeecher. Retrieved June 21, 2024, from https://www.respeecher.com/blog/elevate-language-education-with-custom-ai-voices-a-developers-guide
- Black, A. W. (2007). “Speech synthesis for educational technology”. In Proceedings of the Speech and Language Technology in Education (SLaTE) 2007 (pp. 104-107). https://doi.org/10.21437/slate.2007-25
- Blaauw, M., & Bonada, J. (2017). “A neural parametric singing synthesizer modeling timbre and expression from natural songs”. Applied Sciences, 7(12), 1313. V. Välimäki (Ed.), Special Issue on Sound and Music Computing. https://doi.org/10.3390/app7121313
- Blaauw, M., Bonada, J., & Daido, R. (2019). “Data efficient voice cloning for neural singing synthesis”. In Proceedings of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 6840-6844.
- Cook, P. R. (1996). “Singing voice synthesis: History, current work, and future directions”. Computer Music Journal, 20(3), 38–46.
- Cursus.edu. (2023, June 14). “Voice Cloning in Education: Bridging Language Gaps in Learning Environments”. Retrieved June 19, 2024, from https://cursus.edu/en/30329/voice-cloning-in-education-bridging-language-gaps-in-learning-environments
- Eliav, A., Taub, A., Opochinsky, R., & Gannot, S. (2024). “SingIt! Singer Voice Transformation”. arXiv. https://doi.org/10.48550/arXiv.2405.04627
- Foxwell, A. (2020, September 9). “AI in Education: Voice in Schools”. ReadSpeaker. Retrieved June 21, 2024, from https://www.readspeaker.com/blog/ai-in-education-voice-in-schools/
- Foxwell, A. (2022, December 29). “AI in Education: Examples from the Field, Including Voice Technology”. ReadSpeaker. Retrieved June 20, 2024, from https://www.readspeaker.com/blog/voice-technology-in-education/
- Franzen, C. (2024, March 29). “Voice cloning is becoming the new normal in digital education”. VentureBeat. Retrieved June 19, 2024, from https://venturebeat.com/ai/voice-cloning-is-becoming-the-new-normal-in-digital-education/
- Group 10. (2023). “Can AI voice cloning be used for good?” USC Story Space. Retrieved June 19, 2024, from https://uscstoryspace.com/2023-2024/group10/
- Gu, Y., Yin, X., Rao, Y., Wan, Y., Tang, B., Zhang, Y., Chen, J., Wang, Y., & Ma, Z. (2021). “ByteSing: A Chinese singing voice synthesis system using duration allocated encoder-decoder acoustic models and WaveRNN vocoders”. In International Symposium on Chinese Spoken Language Processing (ISCSLP).
- Helding, L., & Ragan, K. (2022). “Evidence-Based Voice Pedagogy (EBVP), Part 3: Student Goals and Perspectives”. Journal of Singing, 78(5), 635-640. https://doi.org/10.53830/LOGC7063
- Hono, Y., Hashimoto, K., Oura, K., Nankaku, Y., & Tokuda, K. (2019). “Singing voice synthesis based on generative adversarial networks”. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6955–6959).
- Hono, Y., Hashimoto, K., Oura, K., Nankaku, Y., & Tokuda, K. (2021). “Sinsy: A deep neural network-based singing voice synthesis system”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2803-2815.
- Hsu, T. (2010, November 10). “Japanese pop star Hatsune Miku takes the stage — as a 3D hologram”. Los Angeles Times. Retrieved July 1, 2024, from https://www.latimes.com/archives/la-xpm-2010-nov-10-la-fi-hatsune-miku-20101110-story.html
- Hua, K. (2018). “Modeling singing F0 with neural network driven transition-sustain models”. arXiv:1803.04030. https://arxiv.org/abs/1803.04030
- iDict.ai. (2023). “Voice cloning for global success”. iDict.ai Blog. Retrieved June 19, 2024, from https://www.idict.ai/en/blog/voice-cloning-for-global-success
- iDict.ai. (2024). “The Future of Language Learning”. iDict.ai Blog. Retrieved June 20, 2024, from https://www.idict.ai/en/blog/the-future-of-language-learning
- Justus, Z. (2023, February 16). “Voice cloning for education: It is exactly like this picture”. Melts Into Air. Retrieved June 19, 2024, from https://www.meltsintoair.org/chatgpt/voice-cloning-for-education-it-is-exactly-like-this-picture
- Kar, T. (2021). “The use of analogy, imagination, and metaphors as an instructional tool in voice training: A case study”. Journal for the Interdisciplinary Art and Education, 2(1), 9-31.
- Kavitha, K. N. (2024, March 18). “Voice Cloning Technology: Shaping the Future of Communication”. ISME. Retrieved June 20, 2024, from https://www.isme.in/voice-cloning-technology-shaping-the-future-of-communication-prof-kavitha-k-n/
- Kim, J., Choi, H., Park, J., Hahn, M., Kim, S. J., & Kim, J. J. (2018). “Korean singing voice synthesis based on an LSTM recurrent neural network”. In INTERSPEECH (pp. 1551–1555).
- Lee, J., Choi, H., Jeon, C., Koo, J., & Lee, K. (2019). “Adversarially trained end-to-end Korean singing voice synthesis system”. In INTERSPEECH (pp. 2588–2592).
- Nakamura, K., Hashimoto, K., Oura, K., Nankaku, Y., & Tokuda, K. (2019). “Singing voice synthesis based on convolutional neural networks”. arXiv preprint arXiv:1904.06868.
- Nishimura, M., Hashimoto, K., Oura, K., Nankaku, Y., & Tokuda, K. (2016). “Singing voice synthesis based on deep neural networks”. In Conference of the International Speech Communication Association (INTERSPEECH) (pp. 2478–2482).
- Ockhuizen, G. (2024, April 9). “Breaking language barriers with Rask AI’s voice cloning for multilingual education”. Medium. https://medium.com/@kitchencreationclub/breaking-language-barriers-with-rask-ais-voice-cloning-for-multilingual-education-3384c82a2b68
- Pérez, A., Garcés Díaz-Munío, G., Giménez, A., Silvestre-Cerdà, J. A., Sanchis, A., Civera, J., Jiménez, M., Turró, C., & Juan, A. (2021). “Towards cross-lingual voice cloning in higher education”. Engineering Applications of Artificial Intelligence, 105, 104413. https://doi.org/10.1016/j.engappai.2021.104413
- Shirota, K., Nakamura, K., Hashimoto, K., Oura, K., Nankaku, Y., & Tokuda, K. (2014). “Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis”. In Proceedings of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2559-2563). Florence, Italy.
- Syed, H. (2023, April 13). “AI in Education: Its Present and Its Future”. Play.ht. Retrieved June 20, 2024, from https://play.ht/blog/ai-in-education/
- Umbert, M., Bonada, J., Goto, M., Nakano, T., & Sundberg, J. (2015). “Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges”. IEEE Signal Processing Magazine, 32(6), 55-73. https://doi.org/10.1109/MSP.2015.2437653
- Webb, M. (2024, March 5). “Voice cloning, AI twins and deepfakes: threat or useful tool? Or both?” Jisc. Retrieved June 21, 2024, from http://nationalcentreforai.jiscinvolve.org/wp/2024/03/05/voice-cloning-ai-twins-and-deepfakes-threat-or-useful-tool-or-both/
- Yu H and Guo Y (2023). “Generative artificial intelligence empowers educational reform: current status, issues, and prospects”. Front. Educ. 8:1183162. doi: 10.3389/feduc.2023.1183162
Singing synthesis techniques:
- Chen, X., Wu, H., Jang, J.-S. R., & Lee, H.-y. (2024). “Singing Voice Graph Modeling for SingFake Detection”. arXiv. https://doi.org/10.48550/arXiv.2406.03111
- Goto, M., Nakano, T., Kajita, S., Matsusaka, Y., Nakaoka, S., & Yokoi, K. (2012). “Vocalistener and vocawatcher: Imitating a human singer by using signal processing”. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5393–5396). IEEE.
- Kim, S., Jeong, M., Lee, H., Kim, M., Choi, B. J., & Kim, N. S. (2024). “MakeSinger: A semi-supervised training method for data-efficient singing voice synthesis via classifier-free diffusion guidance”. arXiv. https://doi.org/10.48550/arXiv.2406.05965
- Li, H., Wang, H., Chen, Z., Sun, B., & Li, B. (2024). “Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis”. arXiv. https://doi.org/10.48550/arXiv.2405.15093
- Li, R., Huang, R., Wang, Y., Hong, Z., & Zhao, Z. (2024). “Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion”. arXiv. https://doi.org/10.48550/arXiv.2406.02429
- Lux, F., Meyer, S., Behringer, L., Zalkow, F., Do, P., Coler, M., Habets, E. A. P., & Vu, N. T. (2024). “Meta Learning Text-to-Speech Synthesis in over 7000 Languages”. arXiv. https://doi.org/10.48550/arXiv.2406.06403
- Saino, K., Zen, H., Nankaku, Y., Lee, A., & Tokuda, K. (2006). “An HMM-based singing voice synthesis system”. In International Conference on Spoken Language Processing.
- Tang, Y., Shi, J., Wu, Y., & Jin, Q. (2024). “SingMOS: An extensive open-source singing voice dataset for MOS prediction”. arXiv. https://doi.org/10.48550/arXiv.2406.10911
- Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., & Kitamura, T. (2000). “Speech parameter generation algorithms for HMM-based speech synthesis”. In IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (pp. 1315-1318).
- Wang, J., Li, P., Zhang, X., Cheng, N., & Xiao, J. (2024). “Singing Voice Beautifying with Pitch and Expressiveness Condition”. arXiv. https://doi.org/10.48550/arXiv.2404.19187
- Wu, Y., Cho, C., Lee, Y. H., & Kim, T. (2024). “Period Singer: Integrating periodic and aperiodic variational autoencoders for natural-sounding end-to-end singing voice synthesis”. arXiv. https://doi.org/10.48550/arXiv.2406.09894
- Wu, Y., Zhang, C., Shi, J., Tang, Y., Yang, S., & Jin, Q. (2024). “Tok Sing: Singing voice synthesis based on discrete tokens”. arXiv. https://doi.org/10.48550/arXiv.2406.08416
- Zhuang, X., Jiang, T., Chou, S.-Y., Wu, B., Hu, P., & Lui, S. (2021). “Litesing: Towards fast, lightweight and expressive singing voice synthesis”. In ICASSP (pp. 7078-7082).
Audio Security and Anti-Spoofing:
- Federal Trade Commission. (2024). “The FTC Voice Cloning Challenge”. FTC. Retrieved June 20, 2024, from https://www.ftc.gov/news-events/contests/ftc-voice-cloning-challenge
- Fung, B. (2024, February 9). “FCC votes to ban scam robocalls that use AI-generated voices”. CNN. Retrieved June 8, 2024, from https://edition.cnn.com/2024/02/08/tech/fcc-scam-robocalls-ai-generated-voices/index.html
- Javed, A., Malik, K. M., Irtaza, A., & Malik, H. (2021). “Towards protecting cyber-physical and IoT systems from single- and multi-order voice spoofing attacks”. Applied Acoustics, 183, 108283. https://doi.org/10.1016/j.apacoust.2021.108283
- TorrentFreak. (2023, June 22). “RIAA targets AI Hub server”. https://torrentfreak.com/riaa-targets-ai-hub-discord-users-over-copyright-infringement-230622/
- Wang, K., Liu, X., Chen, C. M., et al. (2020). “Voice-transfer attacking on industrial voice control systems in 5G-aided IIoT domain”. IEEE Transactions on Industrial Informatics, 17(10), 7085-7092.
- Yamagishi J, Wang X, Todisco M, Sahidullah M, Patino J, Nautsch A, et al. (2021). “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection”. In Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge (pp. 47–54).
- Zhang, J., Tu, G., Liu, S., & Cai, Z. (2023). “Audio anti-spoofing based on audio feature fusion”. Algorithms, 16(7), 317. https://doi.org/10.3390/a16070317
Human-AI Interaction:
- Broadbent E. (2017). “Interactions with robots: the truths we reveal about ourselves”. Annual Review of Psychology, 68, 627–652. doi:10.1146/annurev-psych-010416-043958 pmid:27648986
- Hoover, A. (2023, October 3). “Voice actors are bracing to compete with talking AI”. WIRED. Retrieved July 1, 2024, from https://www.wired.com/story/ai-voice-actors-jobs-threat/
- Lee, K., Hitt, G., Terada, E., Lee, J. H., & Gaudio Lab. (2022). “Ethics of singing voice synthesis: Perceptions of users and developers”. In Proceedings of the 23rd International Society for Music Information Retrieval Conference. https://osf.io/7em95/
- Mori M (1970). “The uncanny valley”. Energy, 7, 33–35.
- Mori M, MacDorman KF, Kageki N (2012). “The uncanny valley”. IEEE Robotics and Automation Magazine, 192, 98–100.
- Murphy, M. (2021). “Voicing the clone: Laurie Anderson and technologies of reproduction”. Feminist Review, 127(1), 56-72.
- Rosenthal-von der Pütten, A. M., Krämer, N. C., Maderwald, S., Brand, M., & Grabenhorst, F. (2019). “Neural mechanisms for accepting and rejecting artificial social partners in the Uncanny Valley”. Journal of Neuroscience, 39(33), 6555-6570. https://doi.org/10.1523/JNEUROSCI.2956-18.2019
- Scott, K. M., Ashby, S., & Hanna, J. (2020). “Human, All Too Human”: NOAA Weather Radio and the Emotional Impact of Synthetic Voices. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3313831.3376338
- Vainilavičius, J. (2024, March 15). “Uh, robot: stuttering Figure 01 sends viewers into uncanny valley”. Cybernews. Retrieved July 1, 2024, from https://cybernews.com/tech/figure-uncanny-valley-openai/
Technical Implementation Details:
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The MIT Press.
- Gritsenko, A., Salimans, T., van den Berg, R., et al. (2020). “A spectral energy distance for parallel speech synthesis”. Advances in Neural Information Processing Systems, 33, 13062-13072.
- Li, N., Liu, S., Liu, Y., et al. (2019). “Neural speech synthesis with transformer network”. In Proceedings of the AAAI Conference on Artificial Intelligence, 33, 6706-6713.
- Mathew, A., Amudha, P., & Sivakumari, S. (2021). “Deep Learning Techniques: An Overview”. In A. E. Hassanien et al. (Eds.), Advanced Machine Learning Technologies and Applications (Advances in Intelligent Systems and Computing, Vol. 1141). Springer Nature Singapore Pte Ltd. https://doi.org/10.1007/978-981-15-3383-9_54
- Neekhara, P., Hussain, S., Dubnov, S., et al. (2021). “Expressive neural voice cloning”. In Proceedings of the Asian Conference on Machine Learning, 252-267.
- Prenger, R., Valle, R., & Catanzaro, B. (2019). “Waveglow: A flow-based generative network for speech synthesis”. In Proceedings of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 3617-3621.
- Ren, Y., Hu, C., Tan, X., et al. (2021). “FastSpeech 2: Fast and high-quality end-to-end text to speech”. In Proceedings of the International Conference on Learning Representations.
- Sherstinsky, A. (2020). “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network”. Physica D: Nonlinear Phenomena, 404, 132306.
- Sotelo, J., Mehri, S., Kumar, K., et al. (2017). “Char2Wav: End-to-end speech synthesis”. International Conference on Learning Representations.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). “Attention is all you need”. In Advances in neural information processing systems.
- Wang, Y., Skerry-Ryan, R. J., Stanton, D., et al. (2017). “Tacotron: Towards end-to-end speech synthesis”. In Proceedings of the Interspeech, 4006-4010.
- Zhao, Y., Takaki, S., Luong, H. T., et al. (2018). “Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder”. IEEE Access, 6, 60478-60488.
Risk Assessment and Policy:
- Blok, V. (2023). Philosophy of technology in the digital age: The datafication of the world, the homo virtualis, and the capacity of technological innovations to set the world free. Wageningen, Netherlands: Wageningen University & Research. https://doi.org/10.18174/555566
- Bond, S. (2024, May 23). “A political consultant faces charges and fines for Biden deepfake robocalls”. NPR. Retrieved June 8, 2024, from https://www.npr.org/2024/05/23/nx-s1-4977582/fcc-ai-deepfake-robocall-biden-new-hampshire-political-operative
- Ernst, E., Merola, R., & Samaan, D. (2019). “Economics of artificial intelligence: Implications for the future of work”. IZA Journal of Labor Policy, 9(1). https://doi.org/10.2478/izajolp-2019-0004
- Gornet, M., & Viard, T. (2023). “Mapping AI Ethics: A Quantitative Analysis of the Plurality, and Lack Thereof, of Discourses”. http://dx.doi.org/10.2139/ssrn.4582657
- Hutiri, W., Papakyriakopoulos, O., & Xiang, A. (2024). “Not my voice! A taxonomy of ethical and safety harms of speech generators”. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ‘24). ACM.
- Jacobson, W. P. (2011). “The robot’s record: Protecting the value of intellectual property in music when automation drives the marginal costs of music production to zero”. Loyola of Los Angeles Entertainment Law Review, 32, 31.
- Kirk, H. R., Vidgen, B., Röttger, P., & Hale, S. A. (2023). “Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback”. arXiv. https://doi.org/10.48550/arXiv.2303.05453
- Nganyewou Tidjon, L., & Khomh, F. (2022). “The different faces of AI ethics across the world: A principle-implementation gap analysis”. arXiv preprint arXiv:2206.03225.
- Oinas-Kukkonen, H., Pohjolainen, S., & Agyei, E. (2022). “Mitigating issues with/of/for true personalization”. Frontiers in Artificial Intelligence, 5. https://doi.org/10.3389/frai.2022.844817
- Papadopoulos, T., & Charalabidis, Y. (2020). “What do governments plan in the field of artificial intelligence?: Analysing national AI strategies using NLP”. In Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance (pp. 100–111). Association for Computing Machinery.
- Rest of World. (2023, February 22). “Foreign-owned voice cloning companies impact Latin American voice actors”. https://restofworld.org/2023/ai-voice-acting/
- Rivera, J.-P., Mukobi, G., Reuel, A., Lamparth, M., Smith, C., & Schneider, J. (2024). “Escalation risks from language models in military and diplomatic decision-making”. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency.
- Sabin, S. (2023, June 13). “Generative AI is making voice scams easier to believe”. Axios. Retrieved June 8, 2024, from https://www.axios.com/articles/generative-ai-making-voice-scams-easier-to-believe
- Scola, N. (2023, November 2). “Biden’s elusive AI whisperer finally goes on the record. Here’s his warning”. Politico. Retrieved June 27, 2024, from https://www.politico.com/news/magazine/2023/11/02/bruce-reed-ai-biden-tech-00124375
- Shelby, R., Rismani, S., Henne, K., Moon, A., Rostamzadeh, N., Nicholas, P., Yilla, N., Gallegos, J., Smart, A., Garcia, E., & Virk, G. (2022). “Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction”. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES). ACM. https://doi.org/10.1145/3600211.3604673
- Stupp, C. (2019, August 30). “Fraudsters used AI to mimic CEO’s voice in unusual cybercrime case”. The Wall Street Journal. WSJ PRO. Retrieved June 8, 2024, from https://www.wsj.com/articles/fraudsters-used-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case
- UK Government Office for Science. (2023). “Frontier AI capabilities and risks: Frontier model and Foundation model risk analysis”. Retrieved June 27, 2024, from https://assets.publishing.service.gov.uk/media/65395abae6c968000daa9b25/frontier-ai-capabilities-risks-report.pdf
- Vincent, J. (2023, December 18). “Imran Khan’s political party uses AI voice cloning technology for election campaign”. The Verge. Retrieved June 27, 2024, from https://www.theverge.com/2023/12/18/24006968/imran-khan-ai-pakistan-prime-minister-voice-clone-elevenlabs
- Weidinger, L., Rauh, M., Marchal, N., Manzini, A., Hendricks, L. A., Mateos-Garcia, J., Bergman, S., Kay, J., Griffin, C., Bariach, B., Gabriel, I., Rieser, V., & Isaac, W. (2023). “Sociotechnical safety evaluation of generative AI systems”.
- Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P. S., Mellor, J., Glaese, A., Cheng, M., Balle, B., Kasirzadeh, A., Biles, C., Brown, S., Kenton, Z., Hawkins, W., Stepleton, T., Birhane, A., Hendricks, L. A., Rimell, L., Isaac, W., Haas, J., Legassick, S., Irving, G., & Gabriel, I. (2022). “Taxonomy of risks posed by language models”. In ACM Conference on Fairness, Accountability and Transparency (FAccT) (pp. 214–229). Association for Computing Machinery. https://doi.org/10.1145/3531146.3533088
- Whitney, C. D., & Norman, J. (2024). “Real risks of fake data: Synthetic data, diversity-washing and consent circumvention”. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3630106.3659002
Other resources
- Anthropic. (2023, March 8). “Core views on AI safety: When, why, what, and how”. Retrieved June 26, 2024, from https://www.anthropic.com/news/core-views-on-ai-safety
- CNN. (2021, January 25). “South Korea’s Kim Kwang-seok AI show sparks ethical debate”. https://edition.cnn.com/2021/01/25/asia/south-korea-kim-kwang-seok-ai-dst-hnk-intl/index.html
- Digital Music News. (2023, May 30). “Singaporean singer replaced by AI clone”. https://www.digitalmusicnews.com/2023/05/30/singaporean-singer-stefanie-sun-career-hijacked-ai/
- Figure. (2024, March 15). “Figure status update - OpenAI speech-to-speech reasoning” [Video]. YouTube. Retrieved July 1, 2024, from https://www.youtube.com/watch?v=Sq1QZB5baNw&ab_channel=Figure
- Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). “Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI”. Berkman Klein Center Research Publication, (2020-1).
- Hassan, J. (2023). “AI is being used to give dead, missing kids a voice they didn’t ask for”. Retrieved June 26, 2024, from https://www.washingtonpost.com/technology/2023/08/09/ai- dead- children- tiktok- videos/
- Hidalgo Lopez, J. C., Sandeep, S., Wright, M., Wandell, G. M., & Law, A. B. (2023). “Quantifying and improving the performance of speech recognition systems on dysphonic speech”. Otolaryngology–Head and Neck Surgery, 168(5), 1130–1138. https://doi.org/10.1002/ohn.170
- Ivanov, S., & Webster, C. (2017). “Adoption of robots, artificial intelligence and service automation by travel, tourism and hospitality companies – a cost-benefit analysis”. In International Scientific Conference “Contemporary tourism – traditions and innovations”, 19-21 October 2017, Sofia University.
- Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., Maple, C., Cohen, S. N., & Weller, A. (2022). “Synthetic Data – What, Why and How?” arXiv:2205.03257
- Khosravi, H., Shum, S. B., Chen, G., Conati, C., Tsai, Y., Kay, J., et al. (2022). “Explainable Artificial Intelligence in education”. Comput. Educ. Artif. Intell. 3:100074. doi: 10.1016/j.caeai.2022.100074
- Koffi, E., & Petzold, M. (2022). “A tutorial on formant-based speech synthesis for the documentation of critically endangered languages”. Linguistic Portfolio, 11(1), 3.
- Lave, J., & Wenger, E. (1991). Situated Learning: Legitimate Peripheral Participation. Cambridge: Cambridge University Press.
- Lim, W. M., Gunasekara, A. N., Pallant, J. L., Pallant, J. I., and Pechenkina, E. (2023). “Generative AI and the future of education: Ragnarök or reformation? A paradoxical perspective from management educators”. Int. J. Manage. Educ. 21:100790. doi: 10.1016/j.ijme.2023.100790
- Lim, W. M., Kumar, S., Verma, S., and Chaturvedi, R. (2022). “Alexa, what do we know about conversational commerce? Insights from a systematic literature review”. Psychol. Market. 39, 1129–1155. doi: 10.1002/mar.21654
- Maples, M. (1979). “A Humanistic Education: Basic Ingredients”. The Humanistic Educator, 17(3), 107–110.
- McAllister, T., & Ballard, K. J. (2018). “Bringing advanced speech processing technology to the clinical management of speech disorders”. Pages 581–582.
- Motos, V. (2024, February 22). “Guide: The Rise of Voice Cloning Technology”. Medium. Retrieved June 20, 2024, from https://medium.com/be-tech-with-santander/guide-the-rise-of-voice-cloning-technology-8634f6d66472
- People. (2023, April 15). “Bad Bunny is furious about an AI track using his voice”. https://people.com/bad-bunny-is-furious-about-an-ai-track-using-his-voice-8399608
- Prajwal, K. R., Mukhopadhyay, R., Namboodiri, V. P., et al. (2020). “Learning individual speaking styles for accurate lip to speech synthesis”. In Proceedings ofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13796-13805.
- Sackett, D. L., Rosenberg, W. M. C., Gray, J. A. M., Haynes, R. B., & Richardson, W. S. (1996). “Evidence-Based Medicine: What It Is and What It Isn’t”. British Medical Journal Publishing Group, 312(7023), 71–72.
- Statista. (2024). “Digital population worldwide”. Statista. Retrieved July 1, 2024, from https://www.statista.com/statistics/617136/digital-population-worldwide/
- The Guardian. (2022, June 23). “Amazon Alexa could turn dead loved ones into digital assistants”. https://www.theguardian.com/technology/2022/jun/23/amazon-alexa-could-turn-dead-loved-ones-digital-assistant
- Vashishtha, S., & Susan, S. (2020). “Inferring sentiments from supervised classification of text and speech cues using fuzzy rules”. Procedia Computer Science, 167, 1370-1379.
- Vice. (2023f, March 17). “Voice cloning to trick banks”. https://www.vice.com/en/article/dy7axa/how-i-broke-into-a-bank-account-with-an-ai-generated-voice
- Wood, S. G., Moxley, J. H., Tighe, E. L., & Wagner, R. K. (2018). “Does use of text-to-speech and related read-aloud tools improve reading comprehension for students with reading disabilities? A meta-analysis”. Journal of Learning Disabilities, 51(1), 73-84. https://doi.org/10.1177/0022219416688170
- WSJ. (2023, August 27). “CEO scammed by fake audio”. https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402