Twitter may provide useful clues for detecting a spike in influenza, but its users are too different from the population at large for counting votes before they're cast.
There is a sense among big-data utopians that the world we live in is eminently knowable, that buried within the titanic collections of data are the answers to virtually any question if one knows how to look. Nowhere is this more apparent than the recent trend of data-driven attempts to PREDICT THE FUTURE (yes, please read that phrase with a Wizard-of-Oz-like booming echo). Researchers have claimed that online social data can grant the ability to predict everything from box-office revenues and the spread of disease to election outcomes. This is futile and ridiculous. Despite our deep-seated desire that the world be tractable and controllable, we can't predict the future.
The best we could possibly do is early detection. That is, in the best of circumstances it is possible detect the online projections and manifestations of existing offline phenomena that tend to coincide with particular outcomes or events. This works best when there are clear and understandable mechanisms of interaction between these offline phenomena and online social media participation. For instance, early detection of flu outbreaks with Twitter is based on the understanding that people tweet about themselves and their experiences, and as more people fall ill in a given area, more people in that area will tweet about their symptoms. Given that the flu is something we are all exposed to, there is no reason to believe that the sample of people on Twitter is not representative of larger patterns. In cases like this the path from real-world phenomena to people expressing that phenomena online is obvious, and the danger of that expression being highly skewed or disproportionate is low. Hence, early detection using social media is viable.
However, election predictions are a wholly different matter. Election forecasting with twitter is a particularly trenchant example of the cocktail of hubris and naïveté that is widespread in social-media prediction work. For instance in a particularly well-cited 2010 paper titled "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment," researchers in Germany argued that Twitter is a "valid real-time indicator of political sentiment'' in which "the mere number of tweets mentioning a political party" has predictive power that rivals traditional polling. However, this paper, which claimed to have matched traditional polling's error rates for the 2009 German Parliamentary Elections, is indicative of many of the problems with such predictive studies.
Strong early detection work is seriously grounded in the offline social dynamics and phenomena that would lead someone to express a related sentiment online. Work on "predicting" election outcomes is not. Public-opinion polling -- the contemporary gold standard of election forecasting -- involves incredibly sophisticated sampling procedures to identify "likely voters" as opposed to "registered voters," often stratifying by various populations of interest that might otherwise be under-represented. This is a means of grounding the work in the real social dynamics of voting. Only by building into the predictive model a view of what will actually get which people to the polls, is it possible to translate the loosely held public political sentiment of the moment into something that relates to actual outcomes on election day. In Twitter prediction to date there has been no such subtle inclusion of the dynamics of participation and how these map to real world action.
One significant problem is that Twitter is a notably non-representative sample of people. While the demographics of the user base are not yet totally understood (it is difficult to do work on demography on anonymous or pseudonymous platforms), research indicates that the Twitter population in the US, for example, over-represents males, Caucasians, and people in coastal and urban regions (PDF). This population does not differ in dimensions that are particularly relevant for influenza contraction, but they differ significantly in ways that are quite relevant for gauging public opinion.
Furthermore, political tweeting is a small niche activity and the self-selected sub-population of wonks, politicos and ideologues that engage in this are even less representative of the voting public than the general Twitter population. On top of that, research has shown (PDF) that even in a sample of users who have engaged in political tweeting there is vocal minority of frequent tweeters who dominate the silent minority of those who only rarely tweet about politics. When predictions based on frequency of party or candidate mentions, as they have tended to be, this means that an incredibly unrepresentative minority of a minority of a minority is generating the data. Rather than it making sense that tweet counts reflect public opinion and thus election outcomes, in such biased conditions it would, in fact, be shocking if such a simple metric worked.
Indeed, a replication by researchers at the University of Bamberg showed that had the Pirate Party been included in data collection by the authors of the earlier paper, their model would have predicted that the party would have won 34.8 rather than 2.1 percent of the vote in 2009 parliamentary elections. A simplistic measurement of Twitter messages I conducted the day before the French Election earlier this month is similarly confounding (although much less rigorously done). According to the frequency of their mentions on Twitter, Francoise Hollande, who defeated Nicolas Sarkozy by 2.28 percent, should have instead lost by 7 percent. Or if we went by the number of users who mentioned them rather than the number of tweets, Hollande should have won by 15 percent, as he was mentioned by far more people but fewer times per person (a nice example of the vocal minority vs. silent majority issue).
Other studies have provided additional reasons for skepticism. Some research has indicated that Twitter is only slightly better than chance at prediction (PDF), while others have confirmed that tweet-counting is not effective, but that adding a layer of sentiment analysis to determine what messages say can improve results somewhat (PDF).
It may be possible to model public opinion with Twitter, but it would require a much more sophisticated understanding than we now have about who tweets about politics and why, how their tweets relate to their offline actions, and how they differ from the general voting population. Without high-level mechanisms for accounting for these systemic biases, proper weighting and careful, context sensitive interpretation and analysis election prediction with Twitter is a case of "garbage in, garbage out" data work. But at best, this would just provide us a snapshot of public opinion - an early detection of the political zeitgeist as election day approached - not an actual prediction of the outcome. No matter how many petabytes or yottabytes of data you throw at a question, we don't live in a clockwork universe, data-scientists are not prescient, and we can't predict the future.
Update, May 17, 2012, 3:27 pm: In recognition of the valuable work done by the researchers who have advanced our understanding of the topics described above, I would like to mention those cited by name: The work cited on predicting Pandemic flu was done by Lampos and Cristianini; research on theDemographics of twitter was conducted by Mislove et al.; Mustafaraj et al. observed the Silent majority vs. vocal minority dynamics discussed above; Jungherr et al. performed the skeptic replication of Tumasjan's work; Metaxas et al. found that twitter barely out performs chance; Sang and Bos showed that sentiment analysis can increase performance; Gayo-Avello offers a useful and extensive literature reviewfor anyone interested in learning more.
This article available online at: