A New Study Says Twitter Can Predict US Elections

The more a candidate is mentioned on Twitter, the better. (After you control for a lot.)

While the Indiana University study examined House of Representatives and not presidential candidates, according to its findings, a tweet from John Boehner about Barack Obama would predict the election of... Barack Obama. (Reuters)

Political consultants, university sociologists, and amateur statisticians are now one step closer to using data from Twitter like they use polling data.

An Indiana University study has pulled, from the noise and muck of the stream, a statistically significant relationship between Twitter data and US election results.

Specifically, the study found a correlation between the number of times a candidate for the House of Representatives was mentioned on Twitter in the months before an election and his or her performance in that election. The more a candidate is mentioned on Twitter, the better.

Previous studies have found a loose relationship between activity on Twitter and financial or political events. A 2010 study found that certain ways of analyzing Twitter could foretell a film's success at the box office better than a prediction market for that purpose. In 2011, a German Ph.D. student correlated stock market performance to certain analytic variables on Twitter. At the same time, activists are often disappointed when chatter about a candidate on Twitter fails to translate to electoral victory, and there's a sense that Twitter has almost no relation to political success

This study is notable because it proposes a meaningful relationship between Twitter and American election results. "From the beginning, we were looking to construct simple and easy to operationalize measures of political behavior and attitudes that could be useful to social scientists," Joe DiGrazia, one of the researchers, wrote to me in an email.

The study specifically controlled for the conditions surrounding an election. If a candidate is an incumbent, they would be mentioned on Twitter more, so the study discounted their position. Likewise, the study also discounted candidates who the press covered more, using the number of times a candidate's name was mentioned on CNN as an imprecise measurement of mainstream media hype.

The study's finding bears a number of implications. First, it implies that all Twitter chatter is good Twitter chatter: that if more people are talking about you, you're likely to do better at the polls. Second, it provides a working model for predictive Twitter analysis in future elections, especially in the US. As Alex Roarty wrote at National Journal:

In [one of the study's researchers Fabio] Rojas's view, the findings should revolutionize how campaigns conduct themselves. Rather than spending hundreds of thousands, or even millions, of dollars on surveys, campaigns could simply gauge their status on Twitter. That should help campaigns with fewer resources compete with well-heeled incumbents, he said.

While I'm intrigued by the idea of a pesky challenger defeating a "well-heeled incumbent" without polling data, running on Twitter analysis alone, I suspect professional political Twitter analysts will pop up soon enough. Twitter already sells huge bulks of tweets to economic and financial consultants, who use the aggregate feed to predict the stock market; the company would be foolish not to make money off this capability in politics, too. Rather, what struck me about the study was that its researchers thought they could sidestep problems which plague polling data: People will tell pollers they plan to vote for the fashionable candidate, for instance, when they really intend on voting for someone else. I doubt that Twitter, inherently public and performative, avoids that bias.

At last count, eight percent of American adults use Twitter daily; only 15 percent are on it at all. The authors recognize this, yet their algorithms and data remain valid. So I wonder about the effects we'll see from work like this in the next decade or so, from the politicization and commercialization of the digital public sphere we've made, in which the journaling agglomerate can predict the future, in which our textual debates are machine-readable.