When Roger Ebert lost his lower jaw—and, thus, his voice—to cancer, the text-to-speech company CereProc created a synthetic voice that would be custom-made for the film critic. The computerized voice, a fusion of the words Ebert had recorded in his long career, would not sound fully natural; it would, however, sound distinctive. It was meant to help Ebert regain something he had lost with the removal of his vocal cords: a voice of his own.
Most people are not so lucky. Those who have had strokes—or who live with ailments like Parkinson's or cerebral palsy—often rely on versions of synthetic voices that are completely generic in their delivery. (Think of Stephen Hawking's computerized monotone. Or of Alex, the voice of Apple's VoiceOver software.) The good news is that these people are able to be heard; the bad news is that they have still been robbed of one of the most powerful things a voice can give us: a unique, and audible, identity.
Up in Boston, Rupal Patel is hoping to change that. She and her collaborator, Tim Bunnell of Nemours AI DuPont Hospital for Children, have for several years been developing algorithms that build voices for those unable to speak—without computer assistance. The voices aren't just natural-sounding; they're also unique. They're vocal prosthetics, essentially, tailored to the existing voices (and, more generally, the identities) of their users.
They're premised on the idea, Patel told me, that technology now allows us to think about the voice "just like we think about fonts for written text."
It works like this: Volunteers come to a studio and read through several thousand sample sentences (sourced from books like White Fang and The Wonderful Wizard of Oz). Patel, Bunnell, and their team then take recordings of a recipient's own voice, if possible, to get a sense of its pitch and tone. (If the recipient has no voice at all, they select for thing like gender, age, and regional origin.) Then, the team strips down the voice recordings into micro-units of speech (with, for example, a single vowel consisting of several of those units). Then, using software they created—VocaliD, it's called—they blend the two voice samples together to create a new, lab-engineered lexicon: an acoustic collection of words that are at the disposal of a person who needs them to communicate.
This is, despite the algorithmic assistance, a painstaking process. Creating a voice that is simply usable, New Scientist notes, requires a donor to read at least (at least!) 800 sentences. And coming up with a voice that sounds relatively natural requires 3,000 sentences to be read aloud. Plus, the current system—human recording combined with algorithmic remixing—requires the physical presence of voice donors. "Right now," Patel told me, "our process is to call people into the laboratory—and that doesn't scale."
Despite all those impediments, though, people seem to be interested in lending their voices to those in need. Patel, in her capacity as an associate professor at Northeastern University, is now developing the Human Voicebank Initiative, a project that aims to create a repository of human voices that can be donated to people who don't have voices of their own. The initiative currently has more than 10,000 people registered as voice donors, Patel says. She and her team are in the process of building up the project's tech infrastructure, developing tools like a web client and an iPhone app that will allow donors to do their own recordings in their own time.
It's an appropriate use, perhaps, of the devices that will increasingly call on human voices for their commands. "When we're thinking about technologies that you and I use and rely on, we're now going to use speech much more," Patel says. "We talk to our phones, and our phones talk to us."