In a nondescript building in Virginia, analysts are tracking millions of tweets, blog posts, and Facebook updates from around the world
How stable is China? What are people discussing and thinking in Pakistan? To answer these sorts of question, the U.S. government has turned to a rich source: social media.
The Associated Press reports that the CIA maintains a social-media tracking center operated out of an nondescript building in a Virginia industrial park. The intelligence analysts at the agency's Open Source Center, who other agents refer to as "vengeful librarians," are tasked with sifting through millions of tweets, Facebook messages, online chat logs, and other public data on the World Wide Web to glean insights into the collective moods of regions or groups abroad. According to the Associated Press, these librarians are tracking up to five million tweets a day from places like China, Pakistan and Egypt:
From Arabic to Mandarin Chinese, from an angry tweet to a thoughtful blog, the analysts gather the information, often in native tongue. They cross-reference it with the local newspaper or a clandestinely intercepted phone conversation. From there, they build a picture sought by the highest levels at the White House, giving a real-time peek, for example, at the mood of a region after the Navy SEAL raid that killed Osama bin Laden or perhaps a prediction of which Mideast nation seems ripe for revolt.
Yes, they saw the uprising in Egypt coming; they just didn't know exactly when revolution might hit, said the center's director, Doug Naquin. The center already had "predicted that social media in places like Egypt could be a game-changer and a threat to the regime," he said in a recent interview with The Associated Press at the center. CIA officials said it was the first such visit by a reporter the agency has ever granted.
The CIA facility wasn't built specifically to track the ebb and flow of social media: The program was established in response to a recommendation by the 9/11 Commission with the initial mandate to focus on counterterrorism and counterproliferation. According to the Associated Press, the center shifted gears and started focusing on social media after watching thousands of Iranian protesters turn to Twitter during the Iranian election protests of 2009, challenging the results of the elections that put Iranian President Mahmoud Ahmadinejad back in power.
In the past few years, sentiment and mood analysis have become mainstays in the defense and intelligence communities. Last October, an Electronic Frontier Foundation lawsuit revealed how the Department of Homeland Security has carefully monitored a variety of public online sources, from social networks to highly popular blogs like Daily Kos for years, alleging that "leading up to President Obama's January 2009 inauguration, DHS established a Social Networking Monitoring Center (SNMC) to monitor social-networking sites for 'items of interest.' "In August, the Defense Advanced Research Projects Agency (DARPA), invited analysts to submit proposals on the research applications of social media to strategic communication. DARPA planned on shelling out $42 million in funding for "memetrackers" to develop "innovative approaches that enable revolutionary advances in science, devices, or systems."
But how useful is all of this activity?
"You have little control over the composition of a sample," Bollen explained. "Regular surveys are conducted with only 1000 people, but those samples are carefully balanced to provide an accurate cross section of a given society. This is much more difficult to do in these online environments. Sure, the samples are huge -- there are 750 million people on Facebook -- but no matter how you look at it, it's still possible that the sample could still be biased. It requires someone to own a computer, to be on Facebook, to even USE Facebook... There are all kind of biases built into these samples that are difficult to control for."
The other major challenge, says Bollen, is that sentiment analysis only provides a scrape of potentially useful information. "Right now, analysis is very specialized. We're looking at how people feel about very particular topics," says Bollen. "There's a lot room for growth in deeper semantic analysis: not just learning what people feel about something, but what people think about things. There are 250 million people on Twitter....if you could perform even a shallow analysis of people's opinions about something, their semantic opinions, you can learn a lot from the wisdom of the crowd that could be leveraged."
Diving deep into the semantics of online communication is the next big challenge for government agencies. While the Associated Press points out that the CIA uses native dialects to determine sample sizes and pinpoint trending topics among target groups, deciphering the intricacies of human language is a major obstacle, and one that will not be easily overcome.