By MATT O’BRIEN and BARBARA ORTUTAY, AP Technology Editors
The revelation that a documentary filmmaker used voice cloning software to get the late chef Anthony Bourdain to say words he never said has drawn criticism amid ethical concerns about the use of the powerful technology.
The movie “Roadrunner: A Film About Anthony Bourdain” hit theaters Friday and features mostly live footage of the celebrity chef and globe-trotting TV host before his death in 2018. But its director, Morgan Neville, told The New Yorker that a dialogue snippet was created using artificial intelligence technology.
This has renewed a debate on the future of voice cloning technology, not only in the entertainment world, but also in politics and a rapidly growing business sector dedicated to transforming text into realistic human speech.
“Unapproved voice cloning is a slippery slope,” Andrew Mason, founder and CEO of Descript voice generator, said in a blog post Friday. “As soon as you step into a world where you are making a subjective judgment on whether specific cases can be ethical, it won’t be long before everything happens.”
Prior to this week, most of the public controversy around these technologies focused on the creation of hard-to-detect deepfakes using audio and / or video simulations and their potential to fuel disinformation and political conflict.
But Mason, who previously founded and ran Groupon, said in an interview that Descript has repeatedly rejected requests for a voice return, including from “people who have lost someone and are grieving.”
“It’s not even that much that we want to pass judgment,” he said. “We’re just saying you have to have clear lines between what’s right and what’s not.”
Angry and uncomfortable reactions to voice cloning in the Bourdain case reflect expectations and issues with disclosure and consent, said Sam Gregory, program director at Witness, a nonprofit working on the use of video technology for human rights. Obtaining consent and disclosing the technology at work would have been appropriate, he said. Instead, viewers were stunned – first by the fact of the fake audio, then by the director’s apparent rejection of any ethical issues – and expressed their displeasure online.
“It also touches on our fears of death and our ideas of how people could take control of our digital likeness and make us say or do things with no way to stop it,” Gregory said.
Neville did not identify the tool he used to recreate Bourdain’s voice, but said he used it for a few sentences Bourdain wrote but never said out loud.
“With the blessing of his real estate and literary agent, we have used AI technology,” Neville said in a written statement. “It was a modern storytelling technique that I used in a few places where I felt it was important to bring Tony’s words to life.”
Neville also told GQ magazine that he got the approval of Bourdain’s widow and literary executor. The chef’s wife, Ottavia Busia, responded by tweeting: “I was definitely NOT the one who said Tony would have been cool with this.”
While tech giants like Microsoft, Google, and Amazon have dominated text-to-speech research, there are now also a number of startups like Descript that offer voice cloning software. Uses range from talking customer service chatbots to video games and podcasting.
Many of these voice cloning companies feature an ethics policy on their website that explains the terms of service. Of nearly a dozen companies contacted by The Associated Press, many said they had not recreated Bourdain’s voice and would not have done so if asked. Others did not respond.
“We have pretty strict policies regarding what can be done on our platform,” said Zohaib Ahmed, founder and CEO of Resemble AI, a Toronto-based company that sells a personalized AI voice generator service. “When you create a voice clone, it requires the consent of anyone’s voice. “
Ahmed said the rare occasions he allowed posthumous voice cloning were for academic research, including a project working with the voice of Winston Churchill, who died in 1965.
Ahmed said a more common commercial use is to edit a TV commercial recorded by real voice actors and then customize it to suit a region by adding a local citation. It’s also used to dub animated films and other videos, taking a voice in one language and having it speak a different language, he said.
He compared it to past innovations in the entertainment industry, from stuntmen to green screen technology.
A few seconds or minutes of recorded human speech can help teach an AI system to generate its own synthetic speech, although allowing it to capture the clarity and rhythm of Anthony Bourdain’s voice probably took a lot more training. said Rupal Patel, a professor at Northeastern University who runs another voice generation company, VocaliD, which focuses on customer service chatbots.
“If you wanted him to really speak like him, you would need a lot, maybe 90 minutes of good clean data,” she said. “You are building an algorithm that learns to speak like Bourdain spoke. “
Neville is an acclaimed documentary filmmaker who also portrayed Fred Rogers as “Won’t You Be My Neighbor?” And the Oscar-winning film “20 Feet From Stardom”. He started directing his last film in 2019, more than a year after Bourdain’s death by suicide in June 2018.
Copyright 2021 The Associated press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.