You Can't Use Twitter to Predict Election Results

Twitter may provide useful clues for detecting a spike in influenza, but its users are too different from the population at large for counting votes before they're cast.

twitter_615.jpg

President Barack Obama on his Blackberry, no doubt thinking about Twitter (Reuters).

There is a sense among big-data utopians that the world we live in is eminently knowable, that buried within the titanic collections of data are the answers to virtually any question if one knows how to look. Nowhere is this more apparent than the recent trend of data-driven attempts to PREDICT THE FUTURE (yes, please read that phrase with a Wizard-of-Oz-like booming echo). Researchers have claimed that online social data can grant the ability to predict everything from box-office revenues and the spread of disease to election outcomes. This is futile and ridiculous. Despite our deep-seated desire that the world be tractable and controllable, we can't predict the future.

The best we could possibly do is early detection. That is, in the best of circumstances it is possible detect the online projections and manifestations of existing offline phenomena that tend to coincide with particular outcomes or events. This works best when there are clear and understandable mechanisms of interaction between these offline phenomena and online social media participation. For instance, early detection of flu outbreaks with Twitter is based on the understanding that people tweet about themselves and their experiences, and as more people fall ill in a given area, more people in that area will tweet about their symptoms. Given that the flu is something we are all exposed to, there is no reason to believe that the sample of people on Twitter is not representative of larger patterns. In cases like this the path from real-world phenomena to people expressing that phenomena online is obvious, and the danger of that expression being highly skewed or disproportionate is low. Hence, early detection using social media is viable.

However, election predictions are a wholly different matter. Election forecasting with twitter is a particularly trenchant example of the cocktail of hubris and naïveté that is widespread in social-media prediction work. For instance in a particularly well-cited 2010 paper titled "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment," researchers in Germany argued that Twitter is a "valid real-time indicator of political sentiment'' in which "the mere number of tweets mentioning a political party" has predictive power that rivals traditional polling. However, this paper, which claimed to have matched traditional polling's error rates for the 2009 German Parliamentary Elections, is indicative of many of the problems with such predictive studies.

Strong early detection work is seriously grounded in the offline social dynamics and phenomena that would lead someone to express a related sentiment online. Work on "predicting" election outcomes is not. Public-opinion polling -- the contemporary gold standard of election forecasting -- involves incredibly sophisticated sampling procedures to identify "likely voters" as opposed to "registered voters," often stratifying by various populations of interest that might otherwise be under-represented. This is a means of grounding the work in the real social dynamics of voting. Only by building into the predictive model a view of what will actually get which people to the polls, is it possible to translate the loosely held public political sentiment of the moment into something that relates to actual outcomes on election day. In Twitter prediction to date there has been no such subtle inclusion of the dynamics of participation and how these map to real world action.

One significant problem is that Twitter is a notably non-representative sample of people. While the demographics of the user base are not yet totally understood (it is difficult to do work on demography on anonymous or pseudonymous platforms), research indicates that the Twitter population in the US, for example, over-represents males, Caucasians, and people in coastal and urban regions (PDF). This population does not differ in dimensions that are particularly relevant for influenza contraction, but they differ significantly in ways that are quite relevant for gauging public opinion.

Presented by

Alexander Furnas is a research fellow at the Sunlight Foundation in Washington, D.C.

Saving the Bees

Honeybees contribute more than $15 billion to the U.S. economy. A short documentary considers how desperate beekeepers are trying to keep their hives alive.

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register.

blog comments powered by Disqus

Video

How to Cook Spaghetti Squash (and Why)

Cooking for yourself is one of the surest ways to eat well.

Video

Before Tinder, a Tree

Looking for your soulmate? Write a letter to the "Bridegroom's Oak" in Germany.

Video

The Health Benefits of Going Outside

People spend too much time indoors. One solution: ecotherapy.

Video

Where High Tech Meets the 1950s

Why did Green Bank, West Virginia, ban wireless signals? For science.

Video

Yes, Quidditch Is Real

How J.K. Rowling's magical sport spread from Hogwarts to college campuses

Video

Would You Live in a Treehouse?

A treehouse can be an ideal office space, vacation rental, and way of reconnecting with your youth.

More in Technology

Just In