This new paper takes a far grander scale, looking at nearly the entire lifespan of Twitter: every piece of controversial news that propagated on the service from September 2006 to December 2016. But to do that, Vosoughi and his colleagues had to answer a more preliminary question first: What is truth? And how do we know?
It’s a question that can have life-or-death consequences.
“[Fake news] has become a white-hot political and, really, cultural topic, but the trigger for us was personal events that hit Boston five years ago,” says Deb Roy, a media scientist at MIT and one of the authors of the new study.
On April 15, 2013, two bombs exploded near the route of the Boston Marathon, killing three people and injuring hundreds more. Almost immediately, wild conspiracy theories about the bombings took over Twitter and other social-media platforms. The mess of information grew only more intense on April 19, when the governor of Massachusetts asked millions of people to remain in their homes as police conducted a huge manhunt.
“I was on lockdown with my wife and kids in our house in Belmont for two days, and Soroush was on lockdown in Cambridge,” Roy told me. Stuck inside, Twitter became their lifeline to the outside world. “We heard a lot of things that were not true, and we heard a lot of things that did turn out to be true” using the service, he said.
The ordeal soon ended. But when the two men reunited on campus, they agreed it seemed seemed silly for Vosoughi—then a doctoral student focused on social media—to research anything but what they had just lived through. Roy, his adviser, blessed the project.
He made a truth machine: an algorithm that could sort through torrents of tweets and pull out the facts most likely to be accurate from them. It focused on three attributes of a given tweet: the properties of its author (were they verified?), the kind of language it used (was it sophisticated?), and how it propagated through the network.
“The model that Soroush developed was able to predict accuracy with a far-above-chance performance,” Roy said. He earned his Ph.D. in 2015.
After that, the two men—and Sinan Aral, a professor of management at MIT—turned to examining how falsehoods move across Twitter as a whole. But they were back not only at the “What is truth?” question but its more pertinent twin: How does the computer know what truth is?
They opted to turn to the ultimate arbiter of fact online: third-party fact-checking sites. By scraping and analyzing six different fact-checking sites—including Snopes, Politifact, and FactCheck.org—they generated a list of tens of thousands of online rumors that had spread from 2006 to 2016 on Twitter. Then they searched Twitter for these rumors, using a proprietary search engine owned by the social network called Gnip.
Ultimately, they found about 126,000 tweets, which, together, had been retweeted more than 4.5 million times. Some linked to “fake” stories hosted on other websites. Some started rumors themselves, either in the text of a tweet or in an attached image. (The team used a special program that could search for words contained within static tweet images.) And some contained true information or linked to it elsewhere.