From Your Lips to Your Printer

Finally, voice-recognition software that (almost) lives up to its promise to liberate those unable or unwilling to type.

 

 

 

FOR years I knew exactly what a computer would have to do to make itself twice as useful as it already was. It would have to show that it could accurately convert the sound of spoken language to typed-up text. I had a specific chore in mind for such a machine. I would give it the tape recordings I make during interviews or while attending speeches, and it would give me back a transcript of who said what. This would save the two or three hours it takes to listen to and type up each hour's worth of recorded material.

This machine would have advantages for other people, too. It would help groups that want minutes of their meetings or brainstorming sessions, legal professionals who need quick transcripts of what just happened at trials, students in big lecture halls, people who want to dictate e-mail while stuck in traffic, and those who, owing to disability or stress injury, are not able to type.


For years I despaired that such a machine would ever exist. The demonstrations I saw at computer shows, starting in the mid-1980s, left me with the impression that the speech-text barrier in technology was as formidable as the blood-brain barrier long seemed to be in medicine. At the shows the creator of each new system would carefully utter a phrase, which the computer would faithfully render on its screen. But if someone in the audience asked to see the computer handle a different phrase, or if someone with a different voice tried the same phrase, the system would be stumped. The demo person would start talking about the great new version that would be available next year.

Hardened by this experience, I hesitate to say what I'm about to, but here it is: the great new version may have arrived -- or at least a significantly better version. It doesn't do what I dream of, yet, but it does do important things well.

People within the computing industry are mainly excited about the business potential of "embedded" voice-recognition technology. This ranges from the familiar speech options in voice-mail systems ("To keep holding forever, please press or say 'two'") to hand-held devices that will record spoken appointments or phone numbers. Embedded systems have a very wide range of potential uses, and they're technically easier to pull off than full "dictation" systems, which aspire to let the user say anything he might otherwise enter on a keyboard. They're easier because the options the system has to consider are limited: after the voice-mail system asks you to press or say "two," it doesn't have to be able to distinguish "two" from "to" or "too." It needs only to know that all of them, plus "dew" and "do," sound similar -- and different from "four," "for," and "pour" or "three," "tree," and "the."

What I find exciting is the debut of the first plausible dictation technology. It comes from Dragon Systems, of Newton, Massachusetts, and it's called Dragon NaturallySpeaking. Dragon has been a small but admired contender in this field for more than a decade; this year it was acquired by Lernout & Hauspie, a Belgian firm that has battled IBM for overall leadership in commercial speech-recognition technology. With Version 5 of NaturallySpeaking, released in August, Lernout & Hauspie has gained an edge in dictation technology. Now I know that if my hands stopped working, I could still at least compose e-mail.

There are three leading dictation systems, and it's easy to try each one for yourself, because each comes with a thirty-day money-back guarantee. NaturallySpeaking Preferred costs $199; ViaVoice Advanced Edition, from IBM, costs $99.95; and Voice Xpress Advanced (which I did not review), also from Lernout & Hauspie, costs $79. What they offer and how they work are very similar. Each comes with a CD for installation, a detailed instruction manual (and on-screen tutorial), and a telephone-operator-style headset and microphone. You plug the headset cord into the sound card or audio port of your computer (something all modern systems have). The headset is designed to keep the microphone very close to your mouth, where it needs to be for accurate recognition.

Both programs require a lot of processing speed and disk space. They work better and faster if they can load most of their reference data onto your hard disk rather than having to read it from the CD, so you should have at least 300 megabytes of disk space free for installation. Both programs ran acceptably on my three-year-old Pentium II computer, but they are said to be significantly faster on a Pentium III, which includes advanced functions for sound processing. Each program requires you to begin by spending ten to thirty minutes reading sample text to the computer, so that it can be "trained" in the patterns of your voice, and each allows briefer, incremental training sessions to refine recognition as you go on.

The main difference between the programs, at least for me, is that Dragon's just works better. To be more precise, its recognition rate is high enough that I willingly made the small adjustments in my working style necessary to use the system. The payoff for learning to work the IBM system was too low. At the end of the first day I spent trying the Dragon program, it recognized nearly everything I said, and I had little trouble persuading it that some instructions -- for example, "go to end of line" -- were meant to control the program itself rather than to be typed out. ViaVoice and I seemed to be fighting each other, and after a week I put it away. Dragon has also been the consistent winner in computer-magazine reviews.

Presented by

James Fallows is a national correspondent for The Atlantic and has written for the magazine since the late 1970s. He has reported extensively from outside the United States and once worked as President Carter's chief speechwriter. His latest book is China Airborne. More

James Fallows is based in Washington as a national correspondent for The Atlantic. He has worked for the magazine for nearly 30 years and in that time has also lived in Seattle, Berkeley, Austin, Tokyo, Kuala Lumpur, Shanghai, and Beijing. He was raised in Redlands, California, received his undergraduate degree in American history and literature from Harvard, and received a graduate degree in economics from Oxford as a Rhodes scholar. In addition to working for The Atlantic, he has spent two years as chief White House speechwriter for Jimmy Carter, two years as the editor of US News & World Report, and six months as a program designer at Microsoft. He is an instrument-rated private pilot. He is also now the chair in U.S. media at the U.S. Studies Centre at the University of Sydney, in Australia.

Fallows has been a finalist for the National Magazine Award five times and has won once; he has also won the American Book Award for nonfiction and a N.Y. Emmy award for the documentary series Doing Business in China. He was the founding chairman of the New America Foundation. His recent books Blind Into Baghdad (2006) and Postcards From Tomorrow Square (2009) are based on his writings for The Atlantic. His latest book is China Airborne. He is married to Deborah Fallows, author of the recent book Dreaming in Chinese. They have two married sons.

Fallows welcomes and frequently quotes from reader mail sent via the "Email" button below. Unless you specify otherwise, we consider any incoming mail available for possible quotation -- but not with the sender's real name unless you explicitly state that it may be used. If you are wondering why Fallows does not use a "Comments" field below his posts, please see previous explanations here and here.

Google Street View, Transformed Into a Tiny Planet

A 360-degree tour of our world, made entirely from Google's panoramas

Video

Google Street View, Transformed Into a Tiny Planet

A 360-degree tour of our world, made entirely from Google's panoramas

Video

The 86-Year-Old Farmer Who Won't Quit

A filmmaker returns to his hometown to profile the patriarch of a family farm

Video

Riding Unicycles in a Cave

"If you fall down and break your leg, there's no way out."

Video

Carrot: A Pitch-Perfect Satire of Tech

"It's not just a vegetable. It's what a vegetable should be."

Video

The Benefits of Living Alone on a Mountain

"You really have to love solitary time by yourself."
More back issues, Sept 1995 to present.

Just In