Before Barack Obama moved into the White House, he was just another autocorrect joke. Microsoft Word 2003 gently suggested that typists might mean “boatman” and “Osama” when spelling out “Obama,” and early versions of Word 2007 did the same. The president later made it into Word’s dictionary, meaning that the program now recognizes his name as a regular English word. But other than the leader of the free world, who do tech companies choose to include in their autocorrect dictionaries, particularly on smartphones and tablets? How do they keep their dictionaries up-to-date with current events, trends, and rising celebrities?
To understand this question, a short primer on autocorrect might be helpful. In 1993, before anyone had a smartphone (or even a “dumb” phone, for that matter), Microsoft rolled out a new feature of its word processing: autocorrect, a program that corrected typos by guessing what words users meant to type. A couple of years later, a small group of programmers started working on what would later become T-9, the predictive software originally used on phones with a simple numeric keyboard (T-9 stands for "Text on 9 Keys"). These programmers started out with a completely different goal: helping people with disabilities use technology to communicate more easily, since typing on small keyboards was challenging for people with limited motor control. They soon realized their invention was useful for a much broader audience of texters. Their company, Tegic, became known for providing word-prediction software for most handheld devices, evolving along with the phone as most models ditched the traditional numeric keypads for QWERTY keyboards.
When it comes to proper nouns, this process gets a lot trickier.
Today, the language models on phones have become a lot more sophisticated, automatically suggesting whole words and even phrases when users start typing. These suggestions are made based on algorithms that analyze how people use language, mostly on the Internet.
But when it comes to proper nouns, this process gets a lot trickier. Can anyone with a Wikipedia page get inducted into autocorrect? Do the ranks of the Internet-famous, like YouTube star Jenna Marbles, have a better chance of showing up in my text messages than less web-savvy notables like Jonathan Franzen? And how do autocorrect dictionaries stay up-to-date with the current events and local celebrities people might be talking about most?
Nuance, the company that eventually bought Tegic, relies on data from consumers to find the fastest growing vocabulary trends. They provide software for about 60 percent of the world’s handheld devices, including devices made by Samsung, Sony, and LG, said Aaron Sheedy, the vice president of mobile devices. About 30 percent of their users have opted in to a program called “Living Language,” giving the company permission to gather data on the words they manually add to their phone’s dictionary. “That’s how you get ... stuff that doesn’t show up on the Internet,” Sheedy said. “The Internet has got all of this really cool stuff, but it doesn’t pick up your super colloquial text speak, new idioms, new acronyms—stuff that hasn’t even hit the public news sector.”
Their programs don’t search the whole Internet for new trends, though. “We’ve got about 150 global sites where we look for specifically new words that come out,” Sheedy said. One interesting question is how Nuance picks the 150 websites it uses to find new words. The few examples Sheedy gave were all mainstream media and technology sites, including Sports Illustrated, Time, Amazon, and CNN. He maintained that his team doesn’t editorialize the list of people who are famous enough to make it into their predictive software, but the Internet is a big place. By picking out 150 sites to search for information about important trends in people, inventions, and events, they are making a judgment about which voices define cultural significance.