How to Teach Google What a Story Is
Deep inside Google, a small team has been trying to solve a problem that's easy for any schmuck around the watercooler but frighteningly difficult for the world's most data-rich company: telling a story.
Google wanted to solve a problem we can all understand. People take so, so many photographs and yet they actually do very little with them. A chosen few are posted to Instagram. Most sit in vast wastelands of thumbnails on phones or in iPhoto never to be seen after the moment of their creation.
"You come back from a trip with 300 photos and no one is trying to help you do anything with them," said Google social web engineer Joseph Smarr. "You think about how people deal with that, and the main way is to not share anything. The second biggest thing is to share one little vignette or Instagram. Or the worst thing is they dump the whole 300 photos in an album. And that doesn't tell a story in a meaningful way. It's just a series of pictures. It's just a monotone drum beat with no fills: boom-boom-boom-boom."
So Smarr and his teammates—product designer Brett Lider and user experience designer Clement Ng—set a task for themselves. They wanted to create software that would have rhythm and flow like "actual storytelling." Actual human storytelling.
Their solution is available to all Google users as one of Google+'s genuinely awesome photo tools. (One can poke fun at Google+'s foibles, but they nailed photos.)
The product is called Stories, and it takes photos users upload and automatically packages them up into narratives.
Maybe that sounds easy, but that's because you're a human. Teaching a machine a sense of narrative and place isn't quite so easy, even using all of the information that Google knows about a user.
So, I spent time with the Google team that built Stories. I learned how they did it and began to consider what that says about computers' ability to understand the human world enough to help us live in it.
The early prototypes look nothing like the finished product.
At first, Smarr and Lider created something that looked more like a personal report card, a bunch of data compiled like the nerd-famous "annual reports" produced by one-time Facebook designer Nicholas Fenton. A May 2012 design mockup features Lider's check-ins and hiking stats, interactions with people, and musical choices. It's a fairly comprehensive and detailed set of information about a person, artfully chosen and arranged. The idea was to create something like a Facebook News Feed, only for a single person: an algorithmic distillation of your own personal news.
But that was just a mockup. When they began to see what they could actually create with all of Google's data about its users and all its processing might, they discovered something: "Our history is noisy and incomplete," Smarr said. At the same time, they were honing their concept. And they kept coming back to photographs.
Lider ran user group studies, asking people to talk about each of the last ten photographs that they took. Why'd they take it? Who was it for?
There were three broad categories. The first was obvious: they took the photograph for someone. The second category makes sense, too: people used photographs for memory augmentation, to remember a beer they'd liked or a place they wanted to come back to. The third category, though, was "documentation of adventures." And when they asked people how often they actually used the pictures that they took, "recorded adventures had the lowest percentage of people acting upon their intentions," Lider said.
The first two categories have obvious apps and services associated with them. Every messaging app in the world helps people send pictures they've taken for someone. And Evernote, among others, exists to help people remember things.
But adventure recording? That didn't have an App Store-leading app. In fact, people tended to blame themselves for not using those photographs they'd taken. They'd say they were lazy for not getting the photos to friends or into a form that they themselves could enjoy.
Smarr, Lider, and Ng began to sense an opportunity. Lider ran more user tests. He had people come in and play with print outs of photographs.
He'd literally ask them to lay them out on a table. Most people organized the images left to right in a chronological strip. They clustered photos from the same place together, and even place an "establishing shot" of that place at the beginning of each location section.
It's not a photo album. It's not a collage. It's... a kind of narrative biography, Lider realized. So he started looking into the history of that art. This research led to the project's codename: Project Boswell, after James Boswell, Samuel Johnson's biographer. (Seriously!)
"The moment in history we focused in on was when narrative biographies started coming out in the 19th century. Biographies up to that time had been lists of dates and 'just-the-facts' and then you saw famous people and wealthy people commissioning biographers to write narrative biographies. And the most famous of them was this guy James Boswell," Lider said.
"So we thought, what if we could democratize this? I think a big story of Google and technology is the bringing of things to people that were formerly only available to the elite. So the idea that we could be your personal storyteller, be your personal biographer, help you articulate the narrative arcs of points of your life was really exciting to us."
"Sherlock Holmes at one point says, 'I'd be lost without my Boswell," Smarr added. "We liked the idea of this little agent following you around, trying to help you, and remember where you've been."
"If you think of Google Now as your in-the-moment assistant," Lider continued, "Boswell is retrospective."
"If Now is the assistant, Boswell is a friend," added Ng. "He'll go on adventures with you."
So, they had a concept: They would put a narrative biographer on every smartphone.
Now they just had to build it.
"We started trying to figure it out," Smarr said. "Could we string these photos together, pick the best ones, figure out the locations, and actually draw it, and guess when the story started and ended, and give it a title automatically?"
Maybe it sounds like a simple task: mashup smartphone data and photographs to stitch a story together. Assuming a user uploads the photos to Google, and they take some of them with a smartphone, Google's stockpile would contain location data, visual cues, and a time indicator.
But sometimes the photos would arrive from cameras without GPS. Sometimes the date stamps would be wrong. Sometimes the location data would be incorrect. It could be a mess, and unpredictability was baked into users trips.
"You basically are given this stream of user data, which is whatever they gave you, sometimes it's a lot, sometimes it's a little, sometimes it's pretty complete, sometimes it's pretty incomplete, and you try to make sense of it," Smarr said.
There are three really useful signals for the good bot Boswell. The first is geotagged photos, those pin one's location down precisely. The second is Google Now or Google Maps data, which indicate where someone went. The last is the coolest. The team can infer a user's location by doing machine vision on certain landmarks. So, if someone goes to New York and takes pictures of the Flatiron Building (like everyone is wont to do), then Google can recognize that there is a landmark it knows, and tag that location, even if it was taken on a camera that does not embed location metadata in the image.
"We have a whole bunch of algorithms that we run over the data," Smarr said, "and we basically try to figure out if we have enough signal that we have some clue of where you went." Once the algorithms hit a threshold level, the design software kicks in, autogenerating the digital object.
There are a bunch of little touches worth dwelling on. First, the design runs left to right, based on Lider's research on how people thought of scrapbooks. Viewed on a mobile device, a series of images do Ken Burns-style pans and zooms and a title sits at the bottom, "Trip to New York," one of mine says, with a little editing pencil that encourages personalization.
Swipe in and you notice that a light gray line connects both photos and little dotted-edge boxes that read, "Add a narrative." It's like they created the armature of a scrapbook and then let the humans finish it off.
At each location that Stories can detect, there is an image of that place inlaid within a round circle; it's linked to Google, so a user or anyone else who a user shares a story with can investigate the locations in a trip. (Maybe less useful for Queens, as one of mine reads, but more useful for something like ... the Seychelles or something.)
That gray line is almost playful as it tears around the screen at different angles, sometimes with little time-stamps embedded in its trajectory.
The images on a mobile device also have a little bit of what I could call "jitter" as one swipes around. It's a nice effect, something in line perhaps with the Google "material design" ethos. It feels as if these images are sitting on something, not totally flush with the screen background. It adds to that scrapbook feel.
Unlike an Instagram, where every photo is the same size and shape, Stories uses all different sizes and shapes of photographs. Animations are in there, too, thanks to the "auto-awesome" effects Google has had for years, which create collages and animations out of strings of photos. "The idea is to make you feel that it is something put together by a person, not a machine," Ng said.
And it does feel that way—or at least idiosyncratic, if not human— which really helps a product that can't do everything perfectly. Even in cases where the story Google generates is not the one I would have told, or it includes a random picture, there is a personality to these little narratives that I actually find more sympathetic than the average software product. I don't mind forgiving Boswell his imperfections. At least he tries to do something with my sorry adventures.
"We're trying to present the essence of the thing," Smarr said, "and it is harder to distill the essence than it is to regurgitate the user's data back at them."