Algorithms Can Help Stomp Out Fake News

Tools that assist fact-checkers in catching bogus information could be deployed within a year.

Fake news articles appear on laptop screens
Fake news articles from a website registered in Veles, Macedonia (Raphael Satter / AP)

This week, a BuzzFeed survey found that three in four American adults who see fake-news headlines headlines believe them. It’s not hard to see why: A website peddling made-up news stories can easily look nearly as polished as The New York Times, and it’s impossible to keep up with the sheer volume of information published online every minute. And when people believe fake news stories, real things happen—like an assault rifle-wielding man visiting a D.C. pizza joint because he appeared to think the restaurant was involved in a debunked Clinton-centered pedophilia conspiracy.

Fact-checkers need a hand if they’re going to catch up with the pace and breadth of the material shared every day on Facebook, Twitter, and elsewhere around the internet. As soon as next year, they might get that help—in the form of computer algorithms and artificial intelligence.

There are several ways to determine whether a story is true or not, says Carlos Castillo, a data scientist at a research center in Spain called Eurecat. The simplest is to just to consider the source: If the story was published in a prestigious newspaper, for example, or by a decorated journalist, it’s probably more likely to be trustworthy. Another method is to study the way a story is shared on social media: the kinds of words used to describe it, the sorts of users who post it, and the way people respond to it. And a third method is to examine the story itself, by analyzing its internal logic, combing it for claims, and checking those claims against known facts.

Computers can help test stories in all three of these ways. Vetting stories solely by the publications they appear in risks oversimplifying the web by boiling it down into just “good” and “bad” sources—but it’s very easily done. New York magazine’s Brian Feldman cooked up a Chrome extension that uses a modified version of a media professor’s list of “fake, false, regularly misleading, and otherwise questionable ‘news’ organizations” to blacklist certain domains on the web. When you visit one of the blacklisted sites with the extension installed, your browser pops up a warning message. (For a taste of why this sledgehammer approach isn’t ideal, scroll through some of the comments on the extension’s download page.)

Watching a story or a piece of information make its way across a social network is a more nuanced way of evaluating its trustworthiness—but it also demands more resources. This is where computers really come in handy. In 2010, a trio of researchers at Yahoo (including Castillo, who worked there at the time) studied how Twitter users in Chile responded to an enormous earthquake that rattled some of the country’s most populated regions. They found that tweets containing false rumors were far more likely shared skeptically—that is, with commentary that denies the rumor or questions it—than tweets that contained confirmed truths.

In a follow-up paper published the next year, the researchers broadened their findings to show the sorts of characteristics that indicate a piece of information being shared is true: Factually accurate tweets about a news story are generally retweeted by users who’ve tweeted a lot in the past, for example, and by users who have more followers. Another study, this one from researchers at Indiana University Bloomington, found that examining the relationships between users that tweet a particular story can reveal whether it’s spreading organically, or if it’s being helped along by bots and offline collusion.

“The task here is to discover anomalies in the way a content is propagating that makes it different from the way in which real contents propagate,” Castillo wrote in an email.

(Facebook is using a version of social-media vetting on its site right now, as I detailed in a story last month. It uses algorithms to look for posts that get a lot of pushback from a person’s Facebook friends—think comments with links to Snopes articles debunking the poster’s claim—and suppresses future posts with the same link. Mark Zuckerberg has written that Facebook is considering implementing “technical systems to detect what people will flag as false before they do it themselves,” but a spokesperson for the company wouldn’t comment on its progress toward that goal.)

Rather than waiting for a rumor or a story to spread across the internet to tell whether or not it’s true, some engineers and researchers are trying to analyze news in real time. Full Fact, a UK-based fact-checking outfit, published a report this summer outlining the state of the art in fact-checking. It was optimistic about the future of computer-aided fact-checking, sporting a cover emblazoned with the phrase, “How to make fact-checking dramatically more effective with technology we have now.”

Will Moy, the director of Full Fact, says computers could start helping human fact-checkers with some of the more tedious parts of their job sometime next year. A perfect system is still far off, he said—but a “useful” system, one that can start identifying claims in a piece of writing, and make it easier for humans to check basic claims about statistics, is imminently attainable.

Automated fact-checking can take one of two approaches, Moy says. The first is to build a complex artificial-intelligence algorithm that can parse facts floating around in the vast world of unstructured data out on the internet; the second is to try and fit as many facts as possible into a database, and build simple search tools to call them up when they’re needed. Full Fact chose the latter approach, and Moy says he took cues from an unusual source to help make the decision.

“That’s effectively the same choice that Google’s made with its automated cars: Having lots and lots of good information about roads makes it a lot easier to build a driverless car,” he explained. “Having the data in structured, useful formats makes it much easier to do the actual checking part. So we’re trying to build the driverless car of fact-checking.”

Google is more than just an inspiration: Last month, the company’s Digital News Initiative gave Full Fact more than $50,000 to immediately start developing an automated fact-checking helper.

The organization’s focus on standardization also makes collaboration easier. Instead of having every newsroom and fact-checking organization build its own, redundant system, a common database could help fact -checkers around the world contribute to a knowledge base that they could all draw from.

Further down the line, computers will likely be able to take over more and more responsibilities from human fact-checkers. Last year, a group of researchers at Indiana University came up with a system—the lead researcher, Giovanni Ciampaglia, called it a “proof of concept”—that uses links between Wikipedia pages to determine whether a simple statement of fact is true or not. It can evaluate a statement like “Obama is Muslim” by finding the shortest path between the page for “Obama” and “Muslim,” using only the factual information in the box of statistics found on major articles. That particular path is long and tortured, passing through very generic pages like “Canada” along the way, so the system assigns the statement a low truth value.

But computers will never completely take over the role of humans in the fact-checking process. I asked my colleague Amy Weiss-Meyer, who fact-checks magazine articles for The Atlantic, about where algorithms would be most useful in her job—and where they probably wouldn’t be able to help. She said computers could be useful for checking basic facts and figures, but that she’d be suspicious of their ability to evaluate nuances in human writing, like unraveling the assumptions underpinning an argument, or assessing new ways of describing complex scientific processes.

Moy believes that computers will get better at the easy stuff over time, so that human fact-checkers can focus on the most complex parts of their jobs.

“If we can clear away the clutter of simple facts, then we can start working on how to explain complicated judgments that rely on bringing together lots of information, and then making choices about how to weigh that information,” he said. “There’s a lot for human fact-checkers to do. It shouldn't be the mechanics of looking up numbers in spreadsheets—it should be helping people make choices about important factual issues.”