IBM's Watson Memorized the Entire 'Urban Dictionary,' Then His Overlords Had to Delete It

More

You don't even want to know, they should have explained.

what-i-shart.jpg

Jeopardy/Alexis C. Madrigal

Humans talk funny. We invent words. We smash words together, tear them apart, abbreviate them one way, then another. Which is great and fun, if you're a human. Not so great if you are a machine or the kind of human who programs machines to understand language.

And so, when IBM's famous artificial intelligence, Watson, he/she/it of Jeopardy-winning fame, was in development, its head researcher had a great idea. Humans created this repository of slang, The Urban Dictionary. For example, today on the site, we learn that 'healthy gas' is "the gas (fart) produced from a person who has eaten healthy foods like cabbage, beans, broccolli, grains, or other high fiber, high carbohydrate foods." 

Brown realized that this formalization of informal language might be a great way for Watson to understand the way real people communicate. So, he and his team, fed the whole thing into their AI. 

But one problem. Informal language has a tendency to be dirty, nasty language. Its insults and cuss words, new names for gross old things, old names for gross new things, etc. And so, we learn from Fortune's Michal Lev-Ram, they had to delete all that human messiness from Watson's memory

Watson couldn't distinguish between polite language and profanity -- which the Urban Dictionary is full of. Watson picked up some bad habits from reading Wikipedia as well. In tests it even used the word "bullshit" in an answer to a researcher's query.

Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing and scraped the Urban Dictionary from its memory.

Via Ed Yong

Jump to comments
Presented by

Alexis C. Madrigal

Alexis Madrigal is the deputy editor of TheAtlantic.com. He's the author of Powering the Dream: The History and Promise of Green Technology. More

The New York Observer has called Madrigal "for all intents and purposes, the perfect modern reporter." He co-founded Longshot magazine, a high-speed media experiment that garnered attention from The New York Times, The Wall Street Journal, and the BBC. While at Wired.com, he built Wired Science into one of the most popular blogs in the world. The site was nominated for best magazine blog by the MPA and best science website in the 2009 Webby Awards. He also co-founded Haiti ReWired, a groundbreaking community dedicated to the discussion of technology, infrastructure, and the future of Haiti.

He's spoken at Stanford, CalTech, Berkeley, SXSW, E3, and the National Renewable Energy Laboratory, and his writing was anthologized in Best Technology Writing 2010 (Yale University Press).

Madrigal is a visiting scholar at the University of California at Berkeley's Office for the History of Science and Technology. Born in Mexico City, he grew up in the exurbs north of Portland, Oregon, and now lives in Oakland.

Get Today's Top Stories in Your Inbox (preview)

CrossFit Versus Yoga: Choose a Side

How a workout becomes a social identity


Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

CrossFit Versus Yoga: Choose a Side

How a workout becomes a social identity

Video

Is Technology Making Us Better Storytellers?

The minds behind House of Cards and The Moth weigh in.

Video

A Short Film That Skewers Hollywood

A studio executive concocts an animated blockbuster. Who cares about the story?

Video

In Online Dating, Everyone's a Little Bit Racist

The co-founder of OKCupid shares findings from his analysis of millions of users' data.

Video

What Is a Sandwich?

We're overthinking sandwiches, so you don't have to.

Video

Let's Talk About Not Smoking

Why does smoking maintain its allure? James Hamblin seeks the wisdom of a cool person.

Writers

Up
Down

More in Technology

Just In