What Took Facebook So Long?

Scholars have been sounding the alarm about data-harvesting firms for nearly a decade. The latest Cambridge Analytica scandal shows it may be too late to stop them.

A window advertisement that says "Like us on Facebook"
Brian Snyder / Reuters

On Friday night, Facebook suspended the account of Cambridge Analytica, the political-data company backed by the billionaire Robert Mercer that consulted on both the Brexit and Trump campaigns.

The action came just before The Guardian and The New York Times dropped major reports in which the whistle-blower Christopher Wylie alleged that Cambridge Analytica had used data that an academic had allegedly improperly exfiltrated from the social network. These new stories, backed by Wylie’s account and internal documents, followed years of reporting by The Guardian and The Intercept about the possible problem.

The details could seem Byzantine. Aleksandr Kogan, then a Cambridge academic, founded a company, Global Science Research, and immediately took on a major client, Strategic Communication Laboratories, which eventually gave birth to Cambridge Analytica. (Steve Bannon, an adviser to the company and a former senior adviser to Trump, reportedly picked the name.)

The promise of Kogan’s company was that they could build psychological profiles of vast numbers of people by using Facebook data. Those profiles, in turn, might be useful to tune the political messages that Cambridge Analytica sent to potential voters. Perhaps a certain kind of message might appeal more to extroverts, or narcissists, or agreeable people.

To gather that data, the Times reports, Kogan hired workers through Amazon’s Mechanical Turk to install a Facebook app in their accounts. The app, built by Global Science Research, requested an unusual (but not unheard-of) amount of data about users themselves and their friends. That’s how 270,000 Turkers ended up yielding 30 million profiles of American Facebook users that could be matched with other data sets.

From the current reporting, it seems that Kogan violated Facebook’s terms of service in saying he was using the data for academic research, but then selling it to Strategic Communications Laboratories. That’s what got Cambridge Analytica and Kogan in trouble. (Cambridge Analytica told The Guardian that they do not have possession of the data nor did they use any of this data in the 2016 election. An anonymous source in the Times story disputes this.)

There’s a lot about Cambridge Analytica that doesn’t quite add up. Are they data geniuses who swung the Brexit vote and got Trump elected, or pretenders bluffing their way to fat marketing contracts? Right after the election, several stories pointed to their psychological profiles of voters as a crucial piece of the Trump digital machine. As time has gone on, their role has come to be seen as less important, more in line with the tiny slice of the Trump campaign treasury that they got, roughly $6 million.

While the specifics of this particular violation are important to understand, the story reveals deeper truths about the online world that operates through and within Facebook.

First, some of Facebook’s growth has been driven by apps, which the company found extended the amount of time that people spent on the platform, as retired users of FarmVille could attest. To draw developers, Facebook had quite lax (or, as one might say, “developer-friendly”) data policies for years.

Academic researchers began publishing warnings that third-party Facebook apps represented a major possible source of privacy leakage in the early 2010s. Some noted that the privacy risks inherent in sharing data with apps were not at all clear to users. One group termed our new reality “interdependent privacy,” because your Facebook friends, in part, determine your own level of privacy.

For as long as apps have existed, they have asked for a lot of data and people have been prone to give it to them. Back in 2010, Penn State researchers systematically recorded what data the top 1,800 apps on Facebook were asking for. They presented their results in 2011 with the paper “Third-Party Apps on Facebook: Privacy and the Illusion of Control.” The table below shows that 148 apps were asking for permission to access friends’ information.

Pennsylvania State University

But that’s not the only way that friends leak their friends’ data. Take the example of letting an app see your photos. As the Penn State researchers show, all kinds of data can be harvested: who’s tagged in photos, who liked any of the pictures, who commented on them, and what they said.

If one were to systematically crawl through all the data that could be gleaned from just a user’s basic information, one could build a decent picture of that person’s social world, including a substantial amount of information about their friends.

Facebook has tightened up some of its policies in recent years, especially around apps accessing friends’ data. But The Guardian’s reporting suggests that the company’s efforts to restuff Pandora’s box have been lax. Wylie, the whistleblower, received a letter from Facebook asking him to delete any Facebook data nearly two years after the existence of the data was first reported. “That to me was the most astonishing thing,” Wylie told The Guardian. “They waited two years and did absolutely nothing to check that the data was deleted. All they asked me to do was tick a box on a form and post it back.”

But even if Facebook were maximally aggressive about policing this kind of situation, what’s done is done. It’s not just that the data escaped, but that Cambridge Analytica almost certainly learned everything they could from it. As stated in The Guardian, the contract between GSR and Strategic Communications Laboratories states, specifically, “The ultimate product of the training set is creating a ‘gold standard’ of understanding personality from Facebook profile information.”

It’s important to dwell on this. It’s not that this research was supposed to identify every U.S. voter just from this data, but rather to develop a method for sorting people based on Facebook’s profiles. Wylie believes that the data was crucial in building Cambridge Analytica’s models. It certainly seems possible that once the “training set” had been used to learn how to psychologically profile people, this specific data itself was no longer necessary. But the truth is that no one knows if the Kogan data had much use out in the real world of political campaigning. Psychological profiling sounds nefarious, but the way that Kogan and Cambridge Analytica first attempted to do it may well have proven, as the company maintains, “fruitless.”

So what is to be done? It’s possible that these new stories will cause Facebook to restrict the use of its data by people outside the company, including legitimate researchers. But that kind of self-imposed or external regulation would not strike at what’s actually scary about these efforts.

If Cambridge Analytica’s targeted advertising works, people worry they could be manipulated with information—or even thoughts—that they did not consent to giving anyone. And societally, a democracy running on micro-targeted political advertisements tuned specifically for ever tinier slices of the population is in trouble, as scholars like Zeynep Tufekci warned in 2012 (and in 2014).

Those two concerns extend far beyond Cambridge Analytica. In fact, the best system for micro-targeting ads, political or otherwise, to particularly persuadable segments of the population is Facebook itself. This is why Facebook’s market value is half a trillion dollars.

In Facebook’s ad system, there are no restrictions on sending ads to people based on any “targetable” attribute, like older men who are interested in the “Confederate States of America” and the National Rifle Association and who are “likely to engage with political content (conservative).”

A screenshot from Facebook’s ad-purchasing platform (Alexis Madrigal)

That’s to say nothing of the ability to create databases of people from other sources—electoral rolls, data on purchasing habits or group affiliations, or anything gleaned by the hundreds of online data companies—then letting Facebook itself match those people up to their Facebook accounts. Facebook might never reveal the names in an audience to advertisers or political campaigns, but the effects are the same.

Facebook’s laxity and the researcher’s malfeasance are newsworthy. But is the problem with privacy-obviating social networks, psychological profiling, and political micro-targeting that some researcher violated Facebook’s terms of service? Or is it that this controversy estranges the whole enterprise, providing a route to approach the almost unthinkable changes that have come to democratic processes in the Facebook era?