It's like they're trying to speak to me, I know it!  (Illustration: James Boast)



Needle in a haystack? That's easy. Try fishing out a true and precisely targeted finding from an ocean of billions of tweets.

Ever wonder if your water supply is near one of those disposal facilities for wastewater from oil and gas drilling that you read so much about? If you live in Texas, you’re in luck: The small, nonprofit Texas Tribune has turned public data into an interactive map. Just plug in your zip code and find out.

Or how about getting granular detail from the U.S. Census for your neck of the woods? The standard-bearing New York Times has done that county by county for all fifty states. Just roll your cursor over your hometown.

These days, you can find interactive diagrams without trying too hard: charts, graphs, maps and manipulable databases on everything from doctors and pharmaceuticals to Netflix’s almost absurdist film subgenres.

That’s thanks to the relatively new practice of data journalism, with which media organizations large and small have been ferreting out trends, facts, and stories that would otherwise never have come to light. Data analytics have become a critical tool for every major news organization, and tech-savvy reporters who can write code, plumb databases, and accurately interpret complex statistics are among every newsroom’s most prized assets.

Obviously, such data miners are only as good as their analysis, and for lack of sufficiently robust analytic tools there is a mother lode of data just waiting to be tapped in social media.

Consider Twitter, which has more than 288 million monthly active users.  More than 300 billion tweets have been posted since it launched in 2006, amounting to a short history of public opinion on the news events of the 21st century.  

Trying to extract actionable intelligence from all that data is like trying to find a very tiny needle in the world’s largest haystack, but there are sound reasons why news organizations are competing with each other to meet that challenge.

Seth McGuire, senior business development manager for data channels at Twitter, thinks about the uses and value of those needles every day—including the infamous day earlier this year when New York City shut itself down for a winter storm that never came.


“If the New York snowstorm had turned out to be the epic blizzard everybody thought it would be,” he said, “it would have been useful to look at Hurricane Sandy data to find the areas of the city that were hardest hit. Here’s where people tweeted the most for emergency response. Here's where people were unable to get services. Here's where people are likely to experience food shortages.

“When we talk about data on our side,” McGuire continued, “what we mean is it’s not just the public body of a tweet, but really all of the public information contained within it.”  

He was referring in part to a tweet's metadata: the location from which it was sent, the time, whether someone responded and when, from where, and so on. That kind of information is almost always relevant and often critical. A journalist researching reactions to a major presidential speech wants to know what people thought about it, but would have a much better story if he or she knew, for example, what people of voting age in swing states thought, or young people in a certain Middle Eastern country.

But news organizations have an even more compelling reason to focus on Twitter: It has become a serious competitor for breaking news.  The difference between getting a good tip from Twitter and being scooped by it can be a matter of seconds, even micro-seconds, which can be a lifetime, for example, on Wall Street. That’s why major data companies like Thomson Reuters and Bloomberg have incorporated Twitter into their feeds to financial clients.  

The reason nobody has yet managed to pluck scoops reliably out of Twitter’s aptly nicknamed “firehose” is that the critical intelligence of the moment could come from John Doe with 23 followers or from Jane Doe with 2.3 million.  A human reporter could watch the most surgically precise Tweetdeck feed all day long and never come close to seeing it.

Enter Watson, IBM’s cognitive computing system.  IBM and Twitter formed a partnership last fall on the theory that Watson could actually master the undifferentiated mass of data served up by social media like Twitter, and the theory makes intuitive sense. Watson is skilled at cognitive computing, which means being able to understand natural language, to “learn” from mistakes, to adapt to new conditions and to solve novel problems.

Watson demonstrated that ability on Jeopardy.  Besides being able to learn, it can fish in the deepest ocean of data—and compared to even the most impressive human quants, it’s really, really fast.

Like Twitter’s McGuire, Inhi Cho, vice president of strategy and business development at IBM analytics, spends her days imagining the many ways Watson and its existing and potential APIs could make sense of what she calls “the largest corpus of human thought…the pulse of the planet.”

IBM doesn’t have Twitter all to itself: Some media outlets already subscribe to services like Dataminr, which sifts for tweets that might indicate breaking news, and other companies have access to the firehose and their own proprietary analytics.  

But IBM has Watson, and they’re offering to work with companies who want to develop apps with its cognitive computing platform. IBM and Twitter will provide the data, infrastructure, and expertise, in other words, and it will be up to enterprising news organizations to decide how far to go with them.

Cho thinks that will be a long way indeed: “I really think this is a brand new era for computing.”