Today, Google announced that their translation engine, which is premised on simple machine learning techniques multiplied by vast volumes of data, now receives 200 million users per day. The scale of the service spins out some crazy stats about Google's role in language today. Here's Franz Och, a research scientist at the company:
In a given day we translate roughly as much text as you'd find in 1 million books. To put it another way: what all the professional human translators in the world produce in a year, our system translates in roughly a single day. By this estimate, most of the translation on the planet is now done by Google Translate.
Of course, Och pays lip service to human translators "for nuanced or mission-critical translations," but when you just want to know the basics of what someone is saying, Google does the trick. Machine translation really is an amazing service and something it's easy to underestimate now that we have it.
A key question over the next six years is how far Google's current techniques can take them. The strategy for the last six years has been constant: MORE DATA. But even Peter Norvig, head of Google Research, admits that there are declining returns to the more-data game. Certainly, it doesn't appear that just adding more data is going to yield Gary Snyder's translations of Chinese poetry. Eventually, it seems to me, Google (or any other translation software) will have to start understanding (in some way) the semantic content of the words it is arranging. And that's a much harder AI problem to solve than the one that's brought you the wonders of Google Translate.