In 2008, Google released an experiment called Flu Trends, which attempted to predict the prevalence of the flu from searches that users made for about 40 flu-related queries.
Based on the data up to that point in time, Flu Trends worked really well. The Centers for Disease Control, which had been involved in shaping how it functioned, liked the data that it produced.
"We really are excited about the future of using different technologies, including technology like this, in trying to figure out if there's better ways to do surveillance for outbreaks of influenza or any other diseases in the United States," Joseph Bresee, the chief of the epidemiology and prevention branch in the CDC's influenza division, said at the time.
And so, aside from some misplaced nagging about privacy, the new tool was celebrated in the news media. All the big outlets covered it: CNN, The New York Times, the Wall Street Journal, and many, many more.
Flu Trends fit the golden image of Google, circa 2008: a company that did gee-whiz things just because they were good ideas.
This was the era of Google.org, formed under a guy named Larry Brilliant, who said things like, "I envision a kid (in Africa) getting online and finding that there is an outbreak of cholera down the street. I envision someone in Cambodia finding out that there is leprosy across the street." As the global economy collapsed, Google gleamed among the ruins.
This was all years before people started even talking about "big data."
But times have changed. Now, data talk is everywhere, and people are more worried than excited. The NSA looms over every tech discussion.
And Google no longer seems so different from other companies. As Bill Gates put it in an interview with Bloomberg BusinessWeek, "Google started out saying they were going to do a broad set of things. They hired Larry Brilliant, and they got fantastic publicity. And then they shut it all down. Now they’re just doing their core thing." Larry Brilliant left in 2009. Google.org redirects to Google.com/giving/.
Which brings us back to Google Flu Trends. If you've been watching the headlines this month, you've probably seen that a study published in the journal Science took Flu Trends to task for "big data hubris." A team led by Northeastern political scientist David Lazer found that Flu Trends has been missing high, including "100 out of 108 weeks starting with August 2011." They pointed to problems with the opacity of Google's method and the inconsistency of the Google search user interface and algorithms. The paper was titled, "The Parable of Google Flu: Traps in Big Data Analysis."
And the response to it in the media was swift and rough. Here is a list of just a few of the headlines that followed Lazer's methodological fisking:
Google Flu Trends Gets It Wrong Three Years Running
Why Google Flu Is a Failure
A Case of Good Data Gone Bad
Data Fail! How Google Flu Trends Fell Way Short
Google Flu Trends Failure Shows Drawbacks of Big Data
Even the reliable pro-nerd hangout Slashdot headlined their thread on the story, "Google Flu Trends Suggests Limits of Crowdsourcing."
A skim of the headlines and most of the stories would lead you to believe that Google Flu Trends had gone horribly wrong. Here, this thing was nominally supposed to predict future CDC reports—and it wasn't even as good as simple extrapolations from past CDC reports.
How you like them apples, Google/Big Data!
But lurking in the Lazer paper was an interesting fact about the Google Flu Trends data: when you combined it with the CDC's standard monitoring, you actually got a better result than either could provide alone. "Greater value can be obtained by combining GFT with other near–real time health data," Lazer wrote. "For example, by combining GFT and lagged CDC data."
If that was true, and the CDC was aware of it, couldn't they simply combine the data on their own and have a better epidemiological understanding of the country? And if that was true, wasn't Flu Trends a success, at least according to the standards laid out in the Nature paper describing it in 2009?
"The system is not designed to be a replacement for traditional surveillance networks or supplant the need for laboratory-based diagnoses or surveillance," Google and CDC authors wrote. They continued, "As with other syndromic surveillance systems, the data are most useful as a means to spur further investigation and collection of direct measures of disease activity."
In other words, Google Flu Trends is not a magical tool that replaces the CDC with an algorithm. But, then, the people who built it never imagined that it was. If it failed, it did so, more than anywhere else, in the popular imagination—and in the wishes of superficial Big Data acolytes.