IBM's Watson Memorized the Entire 'Urban Dictionary,' Then His Overlords Had to Delete It

More

You don't even want to know, they should have explained.

what-i-shart.jpg

Jeopardy/Alexis C. Madrigal

Humans talk funny. We invent words. We smash words together, tear them apart, abbreviate them one way, then another. Which is great and fun, if you're a human. Not so great if you are a machine or the kind of human who programs machines to understand language.

And so, when IBM's famous artificial intelligence, Watson, he/she/it of Jeopardy-winning fame, was in development, its head researcher had a great idea. Humans created this repository of slang, The Urban Dictionary. For example, today on the site, we learn that 'healthy gas' is "the gas (fart) produced from a person who has eaten healthy foods like cabbage, beans, broccolli, grains, or other high fiber, high carbohydrate foods." 

Brown realized that this formalization of informal language might be a great way for Watson to understand the way real people communicate. So, he and his team, fed the whole thing into their AI. 

But one problem. Informal language has a tendency to be dirty, nasty language. Its insults and cuss words, new names for gross old things, old names for gross new things, etc. And so, we learn from Fortune's Michal Lev-Ram, they had to delete all that human messiness from Watson's memory

Watson couldn't distinguish between polite language and profanity -- which the Urban Dictionary is full of. Watson picked up some bad habits from reading Wikipedia as well. In tests it even used the word "bullshit" in an answer to a researcher's query.

Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing and scraped the Urban Dictionary from its memory.

Via Ed Yong

Jump to comments
Presented by

Alexis C. Madrigal

Alexis Madrigal is the deputy editor of TheAtlantic.com. He's the author of Powering the Dream: The History and Promise of Green Technology. More

The New York Observer has called Madrigal "for all intents and purposes, the perfect modern reporter." He co-founded Longshot magazine, a high-speed media experiment that garnered attention from The New York Times, The Wall Street Journal, and the BBC. While at Wired.com, he built Wired Science into one of the most popular blogs in the world. The site was nominated for best magazine blog by the MPA and best science website in the 2009 Webby Awards. He also co-founded Haiti ReWired, a groundbreaking community dedicated to the discussion of technology, infrastructure, and the future of Haiti.

He's spoken at Stanford, CalTech, Berkeley, SXSW, E3, and the National Renewable Energy Laboratory, and his writing was anthologized in Best Technology Writing 2010 (Yale University Press).

Madrigal is a visiting scholar at the University of California at Berkeley's Office for the History of Science and Technology. Born in Mexico City, he grew up in the exurbs north of Portland, Oregon, and now lives in Oakland.

Get Today's Top Stories in Your Inbox (preview)

Why Are Americans So Bad at Saving Money?

The US is particularly miserable at putting aside money for the future. Should we blame our paychecks or our psychology?


Elsewhere on the web

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

The Death of Film

You'll never hear the whirring sound of a projector again.

Video

How to Hunt With Poison Darts

A Borneo hunter explains one of his tribe's oldest customs: the art of the blowpipe

Video

A Delightful, Pixar-Inspired Cartoon

An action figure and his reluctant sidekick trek across a kitchen in search of treasure.

Video

I Am an Undocumented Immigrant

"I look like a typical young American."

Video

Why Did I Study Physics?

Using hand-drawn cartoons to explain an academic passion

Writers

Up
Down

More in Technology

Just In