Culota's software is remarkably efficient. While the CDC produces weekly estimates for disease outbreaks, those reports typically lag a week or two behind due to the data collection process. Culota's program is lightning fast and entirely automated, analyzing the bulk of each day's Twitter messages and producing an estimate of the current proportion of people with the flu. Culotta compares his software to Google Flu Trends, which uses aggregated search queries to estimate flu activity. "The basis is different," Cullota told me. "They're looking at what people put in a search box while we look at Twitter, but we get pretty comparable results."
But Twitter may prove more efficient than Google Flu Trends in disaster or conflict zones. The ubiquity of Twitter -- and its frequent usage in conjunction with mobile phones -- suggests that the short-form service may be the medium of choice during an unforeseen disaster like the Februrary 2010 earthquake in Chile or in the face of closed or diminished Internet access, as in Iran during the 2009 election protests. "When normal ways of collecting data are out of service, informal mediums become more appealing," Cullota said.
Twitter-analyzing software and other forms of algorithmic forecasting have their limitations, though. In order to determine the accuracy of his model, Culotta matches his data against existing CDC flu trends in the United States. In disaster-ridden or post-conflict areas where the CDC or other global health organizations lack baseline data on infectious diseases, validating the accuracy of a projection proves problematic.
Similarly, the information introduced into the Twitter ecosystem may not always indicate the actual spread of disease. "If you have a lot of events that are extremely rare, you can track them and compare them to baseline ... maybe it wouldn't be cholera stats, but actual false alarms," Cullota said. "You could run the software without validating it and say, compared to past Twitter data, 'here's a big spike' ... but you never know if you've seen these spikes before and whether or not they represent a spike or some false alarm." Cullota pointed to Google Flu Trends as an example. "What about recalls for flu medication? A big spike in searches for a certain query may produce the same sets of keywords that software associates with a spike." Other constraints are simpler: "How many Haitians use Twitter or Google?"
Despite these pitfalls, Cullota maintains that his Twitter-analyzing software may play an integral role in disaster recovery, especially in containing the spread of infectious diseases in devastated areas. "In real time in, say, Miami, you can tell the CDC to look closer. But if there's no CDC, then yeah, it's a valuable alternative," Cullota said. "It's impressive and amazing that we're able to correlate so closely with CDC stats."We're just beginning to learn exactly how valuable this information is. The stereotypically menial messages -- people talking about their coffee -- are actually very important, and we can pull that information out of them."