Twitter may provide useful clues for detecting a spike in influenza, but its users are too different from the population at large for counting votes before they're cast.
There is a sense among big-data utopians that the world we live in is eminently knowable, that buried within the titanic collections of data are the answers to virtually any question if one knows how to look. Nowhere is this more apparent than the recent trend of data-driven attempts to PREDICT THE FUTURE (yes, please read that phrase with a Wizard-of-Oz-like booming echo). Researchers have claimed that online social data can grant the ability to predict everything from box-office revenues and the spread of disease to election outcomes. This is futile and ridiculous. Despite our deep-seated desire that the world be tractable and controllable, we can't predict the future.
The best we could possibly do is early detection. That is, in the best of circumstances it is possible detect the online projections and manifestations of existing offline phenomena that tend to coincide with particular outcomes or events. This works best when there are clear and understandable mechanisms of interaction between these offline phenomena and online social media participation. For instance, early detection of flu outbreaks with Twitter is based on the understanding that people tweet about themselves and their experiences, and as more people fall ill in a given area, more people in that area will tweet about their symptoms. Given that the flu is something we are all exposed to, there is no reason to believe that the sample of people on Twitter is not representative of larger patterns. In cases like this the path from real-world phenomena to people expressing that phenomena online is obvious, and the danger of that expression being highly skewed or disproportionate is low. Hence, early detection using social media is viable.