Before Barack Obama moved into the White House, he was just another autocorrect joke. Microsoft Word 2003 gently suggested that typists might mean “boatman” and “Osama” when spelling out “Obama,” and early versions of Word 2007 did the same. The president later made it into Word’s dictionary, meaning that the program now recognizes his name as a regular English word. But other than the leader of the free world, who do tech companies choose to include in their autocorrect dictionaries, particularly on smartphones and tablets? How do they keep their dictionaries up-to-date with current events, trends, and rising celebrities?
To understand this question, a short primer on autocorrect might be helpful. In 1993, before anyone had a smartphone (or even a “dumb” phone, for that matter), Microsoft rolled out a new feature of its word processing: autocorrect, a program that corrected typos by guessing what words users meant to type. A couple of years later, a small group of programmers started working on what would later become T-9, the predictive software originally used on phones with a simple numeric keyboard (T-9 stands for "Text on 9 Keys"). These programmers started out with a completely different goal: helping people with disabilities use technology to communicate more easily, since typing on small keyboards was challenging for people with limited motor control. They soon realized their invention was useful for a much broader audience of texters. Their company, Tegic, became known for providing word-prediction software for most handheld devices, evolving along with the phone as most models ditched the traditional numeric keypads for QWERTY keyboards.
Today, the language models on phones have become a lot more sophisticated, automatically suggesting whole words and even phrases when users start typing. These suggestions are made based on algorithms that analyze how people use language, mostly on the Internet.
But when it comes to proper nouns, this process gets a lot trickier. Can anyone with a Wikipedia page get inducted into autocorrect? Do the ranks of the Internet-famous, like YouTube star Jenna Marbles, have a better chance of showing up in my text messages than less web-savvy notables like Jonathan Franzen? And how do autocorrect dictionaries stay up-to-date with the current events and local celebrities people might be talking about most?
Nuance, the company that eventually bought Tegic, relies on data from consumers to find the fastest growing vocabulary trends. They provide software for about 60 percent of the world’s handheld devices, including devices made by Samsung, Sony, and LG, said Aaron Sheedy, the vice president of mobile devices. About 30 percent of their users have opted in to a program called “Living Language,” giving the company permission to gather data on the words they manually add to their phone’s dictionary. “That’s how you get ... stuff that doesn’t show up on the Internet,” Sheedy said. “The Internet has got all of this really cool stuff, but it doesn’t pick up your super colloquial text speak, new idioms, new acronyms—stuff that hasn’t even hit the public news sector.”
Their programs don’t search the whole Internet for new trends, though. “We’ve got about 150 global sites where we look for specifically new words that come out,” Sheedy said. One interesting question is how Nuance picks the 150 websites it uses to find new words. The few examples Sheedy gave were all mainstream media and technology sites, including Sports Illustrated, Time, Amazon, and CNN. He maintained that his team doesn’t editorialize the list of people who are famous enough to make it into their predictive software, but the Internet is a big place. By picking out 150 sites to search for information about important trends in people, inventions, and events, they are making a judgment about which voices define cultural significance.
They also have computer programs that crawl the Internet for new words, which was how they decided to add Pope Francis to their autocorrect dictionary. “That’s how we got the word Bergoglio, right around the time the pope was being elected,” Sheedy said. “We said, look, this word is now generating massive amounts of traffic, statistically ranking way higher than it ever did before. Let’s push this out in our next update.”
Unsurprisingly, Sheedy talked a little trash about the other major producers of predictive language software, Microsoft and Apple, claiming their models lack functionality. While Apple declined to comment on this story, a representative from Microsoft said the company relies on even fewer sources to generate the list of public figures who show up on their users’ phones—they use mostly Twitter and Wikipedia.
“As fun as it would’ve been to read Twitter and Wikipedia all day, we instead built programs to ‘crawl’ these sites for new and commonly used words,” the Windows phone team wrote in a blog post in 2012. They also tailor their predictive software to different languages. “Of course, what’s popular in the U.S. may not mean anything in Russia, so we built Word Flow uniquely for each language. What’s also great about using Twitter and Wikipedia is they help us build dictionaries that are appropriate to specific countries or dialects,” the post said.
Microsoft uses stats like the number of Twitter followers a person has to determine who gets included in their dictionaries (showing up in autocorrect is the new best reason to try and become Twitter-famous). They also collect anonymous data from users’ phones to find new words that are being typed a lot.
But most surprisingly, the company’s programmers moonlight as informal editors of Microsoft’s autocorrect dictionary. “We can make our own cultural decisions on what to include,” wrote the company’s representative, John Lord, in an email. “Many of our team’s favorite video games, writers, and much more were added in this manner.”
People probably don’t form opinions about who’s really famous and important based on the names that get suggested by their autocorrect. But it’s still a little surprising that, at least for Windows phone users, the who’s who of autocorrect is shaped by a bunch of computer programmers in some office outside of Seattle.