Worlds Unknown: The Regions Ignored by Google Translate

Google's translation tool has the power to open up parts of the world we don't understand well -- if it just had the right languages.


Last week Google Translate announced that it now has more than 200 million monthly users. As Alexis Madrigal noted, this means that Google is now translating as much in a day as all professional human translators combined complete in a year -- an amount of text equivalent to a million books.

Google Translate is far from perfect -- its garbled prose, creative grammar and bizarre word substitutions have been dubbed "Dada processing" -- but it is one Google products for which one can unequivocally say that it does more good than harm. Because of Google Translate, millions of people access ideas that would have once remained impenetrable. The default dismissal of foreign media is gone.

Since its inception in 2006, Google has added 65 languages from areas extending across much of the world, though two exceptions stand out: Central Asia and sub-Saharan Africa. No languages from Central Asia -- such as Pashto, Usbek, and Uyghur -- make the Google cut. Neither do the African languages Hausa, Yoruba, or Zulu. The sole inclusions from sub-Saharan Africa are Swahili and Afrikaans.

Translation has the potential to shift the politics of perception. Here Central Asia and sub-Saharan Africa share something else in common: They are already on the losing side of this game. These two regions are the least likely to be covered by the international media and the most likely to be dismissed as barbaric, obscure or irrelevant by non-specialists.

Political scientist Laura Saey recently wrote about the problems that plague media coverage of Africa: the paucity of international correspondents, the callous framing of tragic events, the echo chamber of repeated coverage, the tendency to ignore regional diversity, the casual racism and condescension. These practices mark Central Asian media coverage as well. "If it takes hundreds of deaths or a revolution to make you report on a country, don't cover its 'humorous' political and economic failures," Matthew Kupfer wrote last week on Registan, bemoaning the international media mockery of Kyrgyzstan's inability to pay its bills.

Central Asia and sub-Saharan Africa share another important similarity: their events are framed through the languages of colonization. Most international coverage of Central Asia is written by people who speak Russian. Similarly, sub-Saharan Africa is covered by speakers of English, French and Arabic. This is not the fault of the reporters: when bureaus assign so few people to such large regions, one cannot reasonably expect them to know all the local languages, and so it makes sense for them to rely on a lingua franca. But there is so much room to do better, particularly with the tools of the Internet.

It is increasingly common for news agencies to cover a country by reprinting claims made online. This is why CNN films Facebook pages and why complex conflicts get reduced to "Twitter revolutions". Without an ability to translate local languages, reporters rely on whatever material they can understand -- meaning that, to give one example, Russian-language content is often used to represent what is going on in Uzbek communities. Internet content created by speakers of languages like Uzbek (often preferred by citizens of Uzbekistan even if they do know Russian) is ignored.

As a result, important insights and debates remain invisible to the outside world. "There is another Internet, a secret Internet, in which meaningful political conversations take place in Uzbek, Kyrgyz, Kazakh, Turkmen, and Tajik, yet the majority of the world remains none the wiser," I wrote in 2010.This is as true today as it was then.

Google is moving in the right direction: Translation for Kazakh and Kyrgyz is being developed, and Google Africa is soliciting contributions from Africans interested in expanding the site's capabilities. One hopes that these additions will increase the regional knowledge necessary to write with depth and compassion. Nothing substitutes for human translation, especially of Central Asian and African websites filled with jokes, idioms and poetry. But Google Translate can give a sense of what people are concerned about, which may help shift coverage away from the trivialities and biases cited by Saey. Moreover, it allows citizens who only speak regional languages to access foreign media and translate their own works for a broader audience -- a feat which the excellent Global Voices has achieved on a more selective scale. For citizens involved in politics or international affairs, this is an invaluable gift.

"There is never interpretation, understanding and knowledge when there is no interest," Edward Said wrote in Covering Islam, critiquing media bias toward the Muslim world. It is hard to create interest in places -- like Central Asia and sub-Saharan Africa -- that most do not consider as individual, complex entities. In the digital era, impenetrable means invisible; invisible means irrelevant. Adding more local and national languages to Google Translate is a small step toward remedying this problem.

This post originally appeared at and is reproduced with permission.