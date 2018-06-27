Somehow speakers of English master these many possible uses of the word with without anyone specifically spelling it out for them. At least that’s the case for native speakers—in a class for English as a foreign language, the teacher likely would tease apart these nuances. But what if you wanted to provide the same linguistic education to a machine?

As it happens, just days after Miranda sent his tweet, computational linguists presented a conference paper exploring exactly why such ambiguous language is challenging for a computer-based system to figure out. The researchers did so using an online game that serves as a handy introduction to some intriguing work currently being done in the field of natural language processing (NLP).

The game, called Madly Ambiguous, was developed by the linguist Michael White and his colleagues at Ohio State University. In it, you are given a challenge: to stump a bot named Mr. Computer Head by filling the blank in the sentence Jane ate spaghetti with ____________. Then the computer tries to determine which kind of with you intended. Playful images drive the point home. In the sentence Jane ate spaghetti with a fork, Mr. Computer Head should be able to figure out that the fork is a utensil, and not something that is eaten in addition to the spaghetti.

Ajda Gokcen

Likewise, if the sentence is Jane ate spaghetti with meatballs, it should be obvious that meatballs are part of the dish, not an instrument for eating spaghetti.

Ajda Gokcen

In addition to these two possibilities, the noun (or noun phrase) following with could also indicate manner (Jane ate spaghetti with gusto) or company (Jane ate spaghetti with Mary).

Mr. Computer Head tries to differentiate among these potential semantic roles in two ways. In basic mode, the program takes a rule-based approach that has traditionally been used in NLP, first zeroing in on the main noun and then looking it up in a semantic database called WordNet. If the noun is classified as an artifact, then the computer guesses the role is instrumental (like fork). If it is a kind of food, then the role is assumed to be part of the dish (like meatballs). If the noun appears to be a kind of feeling, then that fits the manner role (like gusto). If it’s anything else, Mr. Computer Head surmises that the noun refers to the company that Jane is keeping while eating spaghetti.

In advanced mode, Mr. Computer Head uses a more cutting-edge NLP technique known as word embedding. In this approach, words and phrases are mapped onto a geometrical space known as a vector space, which captures degrees of similarity among different words in a corpus (a large collection of texts). Similar words appear closer to each other in the vector space. Mr. Computer Head matches up the main noun from the input phrase with clusters of words corresponding to the different possible interpretations—say, the instrument cluster, the manner cluster, and so on.