It’s Time to Protect Yourself From AI Voice Scams
Anyone can create a convincing clone of a stranger’s voice. What now?
This month, a local TV-news station in Arizona ran an unsettling report: A mother named Jennifer DeStefano says that she picked up the phone to the sound of her 15-year-old crying out for her, and was asked to pay a $1 million ransom for her daughter’s return. In reality, the teen had not been kidnapped, and was safe; DeStefano believes someone used AI to create a replica of her daughter’s voice to deploy against her family. “It was completely her voice,” she said in one interview. “It was her inflection. It was the way she would have cried.” DeStefano’s story has since been picked up by other outlets, while similar stories of AI voice scams have surfaced on TikTok and been reported by The Washington Post. In late March, the Federal Trade Commission warned consumers that bad actors are using the technology to supercharge “family-emergency schemes,” scams that fake an emergency to fool a concerned loved one into forking over cash or private information.
Such applications have existed for some time—my colleague Charlie Warzel fooled his mom with a rudimentary AI voice-cloning program in 2018—but they’ve gotten better, cheaper, and more accessible in the past several months alongside a generative-AI boom. Now anyone with a dollar, a few minutes, and an internet connection can synthesize a stranger’s voice. What’s at stake is our ability as regular people to trust that the voices of those we interact with from afar are legitimate. We could soon be in a society where you don’t necessarily know that any call from your mom or boss is actually from your mom or boss. We may not be at a crisis point for voice fraud, but it’s easy to see one on the horizon. Some experts say it’s time to establish systems with your loved ones to guard against the possibility that your voices are synthesized—code words, or a kind of human two-factor authentication.
One easy way to combat such trickery would be to designate a word with your contacts that could be used to verify your identity. You could, for example, establish that any emergency request for money or sensitive information should include the term lobster bisque. The Post’s Megan McCardle made this case in a story yesterday, calling it an “AI safeword.” Hany Farid, a professor at the UC Berkeley School of Information, told me he’s a fan of the idea. “It’s so low-tech,” he told me. “You’ve got this super-high-tech technology—voice cloning—and you’re like, ‘What’s the code word, asshole?’”
But we also should be wary of getting paranoid too quickly. A broader loss of trust in any audio, or video for that matter, could feed the “liar’s dividend,” or the idea that more public knowledge about fakes can make it easier for bad actors to undermine legitimate media. America doesn’t exactly have a surplus of trust right now: Faith in media and institutions, including organized religion and public schools, is polling miserably, at the same time that AI is amplifying the ability to spread false information online. “We want people to be aware of what’s possible,” Henry Ajder, an AI expert who has been studying synthetic voice technology for half a decade, told me. “We also don’t want to just absolutely terrify people.” If you do get an out-of-the-ordinary call, you can always just stay calm and ask commonsense questions that your loved ones should know how to answer, Ajder said.
Beyond the anecdotes, data about AI voice scams are practically nonexistent. Juliana Gruenwald, a spokesperson for the FTC, told me that the agency does not track how often AI or voice cloning is used in scams. Fraud-report statistics for the first three months of this year don’t show an increase in the number of scams involving the impersonation of family and friends. The FBI, which also keeps data on phone scams, did not respond to a request for comment.
Still, there’s clearly genuine risk here. Last month, for a story about the proliferation of such clones on TikTok, I replicated Taylor Swift’s voice using just one minute of audio of her talking in an old interview on YouTube. It took five minutes and cost $1 using the online Instant Voice Cloning tool from ElevenLabs. (The company did not respond to a request for comment about how its software could be used in scams.) All the program needs is a short audio clip of the person speaking: Upload it, and the AI will do the rest. And you don’t have to be a high-profile figure to be vulnerable. It simply takes one public audio clip of you, perhaps pulled from a TikTok or an Instagram post or a YouTube vlog, and anyone can create an AI model of your voice that they can use however they choose. Our extensive digital histories, built over years of life online, can be used against us.
Although the technology feels like it’s lifted from a Philip K. Dick novel, this is, in a sense, a classic American story about the uncertainty of a new frontier. The historian Susan Pearson, who wrote The Birth Certificate: An American History, told me that when more Americans began moving from the countryside to cities in the mid-19th century, the country developed “a real cultural fascination” with swindlers and an “anxiety about being in these new large spaces, where all kinds of strangers are going to interact and you don’t necessarily know who you can trust.” We developed technologies like credit scores, for better or worse, so that we might know who we were doing business with. The expanse of the AI-powered internet is perhaps a corollary to that earlier fear.
We’re in a period of change, trying to figure out the benefits and costs of these tools. “I think this is one of those cases where we built it because we could and/or because we can make money from it,” Farid said. “And maybe nobody stopped to think whether they should be doing it.” There are some legitimate use cases for voice cloning: It could empower a person who has become impaired or lost their ability to use their own voice, for instance. In 2021, AI helped the actor Val Kilmer use his voice when he lost his natural ability to speak as a result of throat cancer. But the beneficial uses don’t necessarily require unregulated, free-for-all access, Farid pointed out.
Many critics of AI have said we should slow down and think a little more about what the technology might unleash if left alone. Voice cloning seems like an area in which we really ought to do so. Perhaps humans will evolve alongside AI and create new verification technologies that help us restore trust, but fundamentally, once we start doubting that the person on the other end of the line is really the person we want it to be, we’ve entered an entirely new world. Maybe we’re already there.