Containing a dengue outbreak is a data-driven game of whack-a-mole. There’s no cure or vaccine, so health workers typically focus on other means of preventing the disease. They poison the mosquitoes that spread it, and remove the stagnant water in which the insects breed. To do that effectively, workers really need to know exactly where the disease is rearing its head. “Getting access to that data in a developing country isn’t easy,” says Subramanian. When crowds are thick and resources are thin, data comes neither readily nor accurately.
One alternative is to use indirect data sources like helpline calls or internet search queries. As outbreaks begin, people start searching for information about why they’re feeling sick, and these searches could potentially be used to track diseases. That was the intuitive logic behind Google Flu Trends, a much-hyped way of predicting flu outbreaks by mining search queries. Unfortunately, it grossly overestimated flu levels in America three years in a row, and became known as a “poster child of the foibles of big data.” Telephone hotlines haven’t fared much better. Studies have found that call volumes correlate with flu levels in some regions but not others, making them too unreliable as a means of surveillance over large scales.
At first, that’s also what Rehman’s team found. They showed that the number of calls to their hotline correlated with the number of dengue patients in hospital a few weeks later—but only across Lahore as a whole, and not at finer scales.
The problem is that the search for information is driven by awareness as well as need. “News articles or public awareness campaigns can increase internet searches for a disease, which can limit the usefulness of this type of data,” says Hannah Clapham, an epidemiologist at the Oxford University Clinical Research Unit in Vietnam.
Rehman’s team realized this, and they knew how to deal with it. After the 2011 epidemic, the government of Punjab launched a string of awareness campaigns to teach people about symptoms, prevention measures, and the hotline itself. The team had information about the timing and location of these activities, so when they created a statistical model to predict dengue levels based on call volumes, they added data on awareness levels too. And for good measure, they included weather conditions that influence the lives of mosquitoes, like rainfall, temperature, and humidity.
With these factors accounted for, the model predicted the future numbers of dengue patients in Lahore’s 10 component towns with an average accuracy of 86 percent. The team then set up an app that allowed public health workers to check the model’s predictions in real-time, using their government-issued phones. They could spray insecticides or clean up stagnant water at specific places to contain the spread of the disease.The fact that public health workers are actually using the system “enables evaluation in real-time,” says Elaine Nsoesie, a professor of global health at the University of Washington. “Like other systems using non-traditional data sources, there is always a need for continuous maintenance and re-evaluation.” Indeed, Subramanian notes that their model isn’t static. It continuously retrains itself as new data comes in.