A Secret Code in Google Translate?

A glitch in translating Latin placeholder text is sparking conspiracy theories.

Earlier this week security reporter Brian Krebs pointed out an odd glitch in Google Translate. It had to do with the service’s treatment of “Lorem Ipsum” placeholder text—the string of Latin words that people use to block out space for text on websites and in other designs before meaningful verbiage is added.
For some reason, strings of “Lorem Ipsum” were coming back as “NATO.” In his post, Krebs works through a few examples and posits a few explanations. Perhaps someone is gaming the translate system for fun, or to get around Chinese censorship laws.
Could it be a code hidden in plain sight?
Even before Krebs finished the post, Google had changed its translation algorithm to make reproducing these results impossible. Now, rather than “lorem” returning “China,” Google Translate simply throws “lorem” right back at you. And, for its part, Google responded cheekily with a Tweet. Garbage in, garbage out, they said. (Google turned down my request for an interview, dismissing the translation as a technical snafu.) But for some that’s not quite good enough—and the assumption that Google is hiding something rather than simply failing at translation says a lot about how we see the Internet giant.
“I’m not a tinfoil hat kind of guy for the most part,” Krebs told me, “but it was very clear that the tinfoil hat people were going to have a field day with this.” And in some ways it's the perfect conspiracy theory, because you can't prove what's going on either way. Without Google’s help—which they haven’t yet offered—there’s no way to know why the translate algorithm connected “lorem lorem” to “China’s Internet.”
And translating “lorem” to “China” does seems like something more than just garbage in, garbage out. It may not be the dark Internet, but it also doesn’t seem to be entirely random. One explanation could have to do with the text the algorithm uses to generate its translations. Google Translate works by drawing from vast banks of text, searching for patterns in language use to match future translation requests. Some of those texts include documents from the United Nations and the European Union that have to be translated into multiple languages. It’s possible that if either entity uses lorem ipsum placeholder text in a document, Google might think that it’s looking at the “Latin translation” of the text.
Another potential culprit could be programmers involved with the DefCon Badge project—teams who spend hours hacking projects and puzzles. "If anybody was going to to go through the trouble of trying to game the results it would be those guys," Krebs says.
While it's possible that something like this could happen randomly based on the law of large numbers and just how much text Google Translate is dealing with, not everybody is convinced this is accidental. "Things like this in isolation are very unlikely," says Pedro Domingos, a machine-learning researcher at the University of Washington. And he points out that tricking Google into encrypting your own cypher like this wouldn't be impossible—it simply involves putting up a wall of dummy text and its translation for Google to trawl and learn from. "My guess would be that there is something non-accidental here. Exactly what it was we may not ever find out."
The real answer is probably that Google Translate simply isn’t perfect. Krebs is relatively convinced that this is just a blip in machine learning—that the algorithm simply doesn’t have enough new Latin documents to pull from to help it make sense of Latin text. So when we feed it nonsense text it does the best it can to make meaning from it—to find the connections it thinks we’re seeking from the bank of information it has. “It doesn’t have enough to go on, and in an attempt to impress its creators, it's trying to figure it out on its own,” he says.
Humans are good at this kind of patterning and meaning making out of nonsensical data, too. "Lorem ipsum" is used because it is meaningless, but we assume the information we get back from Google must be meaningful, so we try to map what it means back onto the results we’ve got. Which is how we end up here—wondering whether a failure in Google Translate is actually a secret Chinese code.
Then again, maybe it is. Krebs reminds me of the famous Joseph Heller quote: “Just because you're paranoid doesn't mean they aren't after you.