Most people who cannot speak, though, do not have the luxury of being Stephen Hawking.
* * *
An estimated eight out of every 1,000 Americans, or 2.5 million people, are severely speech-impaired due to a variety of conditions: head injuries, congenital disorders like cerebral palsy, or degenerative diseases like Hawking’s ALS. Many of them rely on text-to-speech machines, typing words that are then vocalized electronically. They sound like computers. And because computers are manufactured in batches of more than one, they also sound like each other.
In August 2002, Rupal Patel, a speech-science professor at Northeastern University, was at a speech-technology conference in Odese, Denmark to present the results of her latest research. People with dramatic speech impairment, she had found, were still able to control the melody of their voices (also called the “prosody” of a voice, or its pitch, tempo, and volume) even when they couldn’t form words; as a result, many people forewent their communication devices when talking to those closest to them, relying on inflection to help convey meaning.
Walking through the conference’s exhibition hall after her presentation, Patel passed a young woman and older man engaged in conversation, their voices indistinguishable from one another—both were using the same text-to-speech system.
Patel paused, listened. The same sound, she realized, was all around her. People throughout the hall—“nearly half the room,” she recalls—were using nearly identical voices.
“That’s when I put two and two together,” she says. “I thought, well, if they have this part of their voice that’s preserved, maybe I would be able to build a voice for them.”
The idea stayed with her. For the next few years, Patel developed and fine-tuned her process, and in 2007 she received a grant from the National Science Foundation to pursue the project that would become VocaliD (pronounced “vocality”), a for-profit company that creates personalized voices for text-to-speech systems by blending sounds taken from speech-impaired people with words recorded by healthy donors. (The price of a voice, she says, will ultimately depend on demand.)
The company's technology is based on the “source-filter theory,” which breaks the production of human speech into two components. One is the source, or the sound made by the vibrations of the vocal cords. The other is the filter, or the vocal tract: the path of these vibrations as they echo through the chambers of the neck and head. Conditions that cause speech impairment mainly affect the filter; the prosody of a voice is controlled by the source, which is usually left intact.
To create a voice, Patel says, “we’re taking the filter, the shape of the vocal tract, from the voice donor, and the source from the individual who’s given us something as limited as a vowel.” After taking a short recording from a recipient—who often can only vocalize as much as an “ahhh” sound—the VocaliD team selects a donor with a similar filter and uses a computer algorithm to layer one over the other. Donations come via the company’s “voice bank,” which opened to the public over Thanksgiving weekend. To donate, a person needs a computer, a microphone, and a few hours of time to record the hundreds of sentences Patel has compiled from old stories and common phrases to encompass all of the sounds of the English language.