Inside Google's Plan to Build a Catalog of Every Single Thing, Ever

More

There's a lot more to Google's Knowledge Graph than might be apparent from what you see in a casual search.

galatea22.jpg

The ugly truth is that computers don't know anything. They have no common sense.

This idea had been circulating in Metaweb co-founder John Giannandrea's head since 1997 when he was working at Netscape and thinking about how to reveal what you did not know you didn't know on the web. If you were looking at search results for a hiking trail, say, what other hiking trails might you look at? Giannandrea called it "going sideways through the web," and he loved the idea, even if he couldn't execute it back then.

Years later, in 2005, Giannandrea teamed up with Danny Hillis and Robert Cook to cofound Metaweb, which had a simple premise: "What if we could make a catalog of all the stuff our computer should know?" Giannandrea told me in a recent interview. "We were interested in building a model of the world. Our computers are remarkably dumb about the stuff that we take for granted. You learn about stuff. You have some context for understanding. Our computers don't work that way because we don't have any loaded context."

With remarkable confidence (hubris?), he and the other founders said to themselves, "Teaching computers all the discrete stuff in the world seems like it should be doable," so they set out to make a machine-readable catalog of everything in the world.

Last month, their project was finally let loose into the wild as the Google Knowledge Graph, which you now see showing up in your search results on the right of your screen. But there's a lot more to the creation of the Knowledge Graph than might be apparent from using it in casual searches.

This is one of those human knowledge projects that is ridiculous in scope and possibly in impact. And yet when it gets turned into a consumer product, all we see is a useful module for figuring out Tom Cruise's height more quickly. In principle, this is both good and bad. It's good because technology should serve human needs and we shouldn't worship the technology itself. It's bad because it's easy to miss out on the importance of the infrastructure and ideology that are going to increasingly inform the way Google responds to search requests. And given that Google is many people's default portal to the world of information, even a subtle change in the company's toolset is worth considering.

And that's how I found myself on the phone with John Giannandrea discussing mojitos and semantic graphs. "Take the drink called the mojito," he said. "Mojito has ingredients and mint, rum, ice. We'll create a catalog entry for that entity for that human concept 'mojito' and then we'll create a connection between the mojito and its ingredients." The key difference between their catalog and a standard database is that the connection between the mojito and mint is itself an entity, an entity that says, "This thing is an ingredient in this other thing." The edge between the two nouns contains meaning and that makes all the difference. "We can talk about the representation of knowledge with the knowledge itself," Giannandrea said. Whoa, Meta! I thought. Hence, Metaweb.

But there's at least one problem. If you're going to build a catalog of all common sense things in the world, where do you start? The answer was simply, "Somewhere." They added bodies of water and bridges, which go over bodies of water, and highways which the bridges are a part of, and the length of those highways and the states through which the highways run, and the capitols of those states, and the populations of those capitols, and the population of the United States, and the population of every country in the world, and the dates in which those countries were founded, and so on and so forth and so on and so forth.

They built tools to import data from other sources, so that if they got a database from the French cheese association, they could crank out the sodium levels in those cheese and also tell you a bit about the regions they came from.

After five long years, they had 12 million objects in the database. And they were purchased by Google. In the first year after the acquisition, they had 25 million things. What did Google bring to the acquisition, aside from money? Data, of course, of a very specific kind. Before, they were just guessing at what people might want to know (cheese, rivers, highways, etc). With Google's search data, they *know* what users are after, so they can go about finding and making that information available.

With Google's help, their database has grown rapidly to over 500 million items objects. That's orders of magnitude larger than previous attempts to educate artificial intelligences like the Cyc project out of the University of Texas. (Though it should be noted that Cyc has some capabilities that the Knowledge Graph does not.)

In the end, what is most significant to Giannandrea is that "we're taking a baby step in teaching all our computers at Google something about our human world." As for what comes next, he can't say, but the idea is that it will become a resource that all Google developers can call on, the core of common sense at the center of Google's vast web.

Jump to comments
Presented by

Alexis C. Madrigal

Alexis Madrigal is the deputy editor of TheAtlantic.com. He's the author of Powering the Dream: The History and Promise of Green Technology. More

The New York Observer has called Madrigal "for all intents and purposes, the perfect modern reporter." He co-founded Longshot magazine, a high-speed media experiment that garnered attention from The New York Times, The Wall Street Journal, and the BBC. While at Wired.com, he built Wired Science into one of the most popular blogs in the world. The site was nominated for best magazine blog by the MPA and best science website in the 2009 Webby Awards. He also co-founded Haiti ReWired, a groundbreaking community dedicated to the discussion of technology, infrastructure, and the future of Haiti.

He's spoken at Stanford, CalTech, Berkeley, SXSW, E3, and the National Renewable Energy Laboratory, and his writing was anthologized in Best Technology Writing 2010 (Yale University Press).

Madrigal is a visiting scholar at the University of California at Berkeley's Office for the History of Science and Technology. Born in Mexico City, he grew up in the exurbs north of Portland, Oregon, and now lives in Oakland.

Get Today's Top Stories in Your Inbox (preview)

Why Are Americans So Bad at Saving Money?

The US is particularly miserable at putting aside money for the future. Should we blame our paychecks or our psychology?


Elsewhere on the web

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

The Death of Film

You'll never hear the whirring sound of a projector again.

Video

How to Hunt With Poison Darts

A Borneo hunter explains one of his tribe's oldest customs: the art of the blowpipe

Video

A Delightful, Pixar-Inspired Cartoon

An action figure and his reluctant sidekick trek across a kitchen in search of treasure.

Video

I Am an Undocumented Immigrant

"I look like a typical young American."

Video

Why Did I Study Physics?

Using hand-drawn cartoons to explain an academic passion

Writers

Up
Down

More in Technology

Just In