Twitter may provide useful clues for detecting a spike in influenza, but its users are too different from the population at large for counting votes before they're cast.
There is a sense among big-data utopians that the world we live in is eminently knowable, that buried within the titanic collections of data are the answers to virtually any question if one knows how to look. Nowhere is this more apparent than the recent trend of data-driven attempts to PREDICT THE FUTURE (yes, please read that phrase with a Wizard-of-Oz-like booming echo). Researchers have claimed that online social data can grant the ability to predict everything from box-office revenues and the spread of disease to election outcomes. This is futile and ridiculous. Despite our deep-seated desire that the world be tractable and controllable, we can't predict the future.
The best we could possibly do is early detection. That is, in the best of circumstances it is possible detect the online projections and manifestations of existing offline phenomena that tend to coincide with particular outcomes or events. This works best when there are clear and understandable mechanisms of interaction between these offline phenomena and online social media participation. For instance, early detection of flu outbreaks with Twitter is based on the understanding that people tweet about themselves and their experiences, and as more people fall ill in a given area, more people in that area will tweet about their symptoms. Given that the flu is something we are all exposed to, there is no reason to believe that the sample of people on Twitter is not representative of larger patterns. In cases like this the path from real-world phenomena to people expressing that phenomena online is obvious, and the danger of that expression being highly skewed or disproportionate is low. Hence, early detection using social media is viable.
However, election predictions are a wholly different matter. Election forecasting with twitter is a particularly trenchant example of the cocktail of hubris and naïveté that is widespread in social-media prediction work. For instance in a particularly well-cited 2010 paper titled "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment," researchers in Germany argued that Twitter is a "valid real-time indicator of political sentiment'' in which "the mere number of tweets mentioning a political party" has predictive power that rivals traditional polling. However, this paper, which claimed to have matched traditional polling's error rates for the 2009 German Parliamentary Elections, is indicative of many of the problems with such predictive studies.
Strong early detection work is seriously grounded in the offline social dynamics and phenomena that would lead someone to express a related sentiment online. Work on "predicting" election outcomes is not. Public-opinion polling -- the contemporary gold standard of election forecasting -- involves incredibly sophisticated sampling procedures to identify "likely voters" as opposed to "registered voters," often stratifying by various populations of interest that might otherwise be under-represented. This is a means of grounding the work in the real social dynamics of voting. Only by building into the predictive model a view of what will actually get which people to the polls, is it possible to translate the loosely held public political sentiment of the moment into something that relates to actual outcomes on election day. In Twitter prediction to date there has been no such subtle inclusion of the dynamics of participation and how these map to real world action.
One significant problem is that Twitter is a notably non-representative sample of people. While the demographics of the user base are not yet totally understood (it is difficult to do work on demography on anonymous or pseudonymous platforms), research indicates that the Twitter population in the US, for example, over-represents males, Caucasians, and people in coastal and urban regions (PDF). This population does not differ in dimensions that are particularly relevant for influenza contraction, but they differ significantly in ways that are quite relevant for gauging public opinion.