On Google and Facebook: 'The Finest Intelligence Operation on Earth'

Stephen Lam / Reuters

On Tuesday, Mark Zuckerberg of Facebook finally appears before Congress. Franklin Foer, who has extensively chronicled the relationship between social-media companies and democracy, had a report yesterday on the phase-change in national power that his appearance might indicate. (And you can take an advance look at Zuckerberg’s prepared testimony, highly underwhelming in my view.)

Last week I ran a long dispatch by my friend Michael Jones, one of the inventors of Google Earth and former “Chief Technology Advocate” at Google, arguing the difference (as he had seen it) between the Google and the Facebook approach to customer data. In short: Both companies based their business on achieving a more and more precise understanding of who their users were. But, Jones said, there was a big difference in how they protected the information, Google being more intent on making sure the “Personally Identifiable Information,” PII, never left its own control.

His argument attracted a lot of discussion on Twitter, from some past and present employees of Facebook and some other figures. I’ve also heard many dissenting (and supporting) views.

The purpose of this post is to quote a few of the dissents, and a reply from Michael Jones.

First, from someone within the Facebook world:

[A relative who works for Facebook] told me: "The article is mostly false in regards to what advertisers get. It makes it seem like the advertiser knows your every move—as well as your kids. Obviously FB and Google have tons of PII, but it really only gets shared with advertisers in aggregate, not one by one. Unless the user volunteers it [i.e. fills out a form to provide their information to the advertiser]".

I agree with this. There are two cases to consider here: Facebook Advertisers and Facebook Applications.

Facebook Advertisers: When an advertiser uses FB (or Google) to post ads, very granular micro-targeting can be done in order to deliver the ads/content to a very specific audience. The advertiser doesn't know who the ads are getting delivered to; all of that information is held by FB (or Google). In this area, Google and FB are identical.

Facebook Applications: Facebook provides a mechanism to allow Facebook users to use their FB identify (i.e. username/pwd) to access other applications. This way, the user just has to remember their one identity. When the user first sets this up for a specific application, they are told that when doing this, their FB information is going to be provided to the application (and it lists the type of information that is going to be accessible by the app). Several years ago, the application could receive not just the user's information, but the information for the user's FB Friends as well. This type of access (friends) was discontinued.

Now to the "data breach" topic. This was a case involving a FB Application (or really *all* FB applications). The professor at Cambridge built a survey application that was accessed via Facebook as part of their research. There were a large number of users that completed the survey (~270,000), each one knowingly (explicitly) granting access to their information for the survey app.

When the survey app downloaded the user info, they received not just the survey users data, but their friends data as well. That is how they came to receive such a large data set (270,000 users *  ~320 friends per user = 87M fb users). Again, this practice of allowing the "friends" data to be downloaded was stopped several years ago. Today, the only data accessible to a FB app is that data for the users that have granted access to the app.  [JF note: I disagree  on this point. The “permissions” for a range of  apps have been buried within Facebook’s settings, and were turned on by default. Most users would “grant” this access without realizing they were doing so.]

So back to the article. Michael Jones's claim that "To be clear, THE USER INFORMATION LEAVES FB AND GOES TO THE ADVERTISER/ POLITICAL MANIPULATOR/ whatever." is false. Advertisers have no access to specific user data. It works exactly like Google. Advertisers specify the parameters of who should receive the ads/content and FB executes that. It is only FB applications that receive user data, and the user knowingly and explicitly grants permission to the app to receive this information.

Aside from this, I do agree that regulation is probably in order for all advertising, and should especially be there for the direct sharing of user data (i.e. the FB app case)

The conflation by the media of these two cases (advertising and apps) is obviously not good, and of course various parties are using it to their political/business advantage.

* * *

As for the root of this issue, I would claim that the core issue stems from the collection and recording of the data to begin with. Once collected, it can and will be used for ‘evil’ in addition to ‘good’. This is true at our Intelligence Agencies and it is true here as well.

A fundamental worldview is at play. Mark Z has the view that humanity is basically good (Humanist). [JF note: I’m not so sure about this, either.] This then leads to a particular level of trust afforded to users/advertisers/developers/employees/etc., MZ feeling that people will use the platform for ‘good’ and that they can prevent the ‘bad actors’ from doing ‘evil’.

For example, FB (the company itself) is very open, providing access to all of its data to all employees. It is surprising to anyone when they begin to work there given the normal closed/secret culture of Silicon Valley.

Unfortunately this view is not correct. We are surrounded by the facts of history and day to day life that show that humanity is not good by nature. Lustful desires for sex, power, position, and possessions are at the core of the human ‘heart’. This nature can not be trusted, because it will and does act.

Our Founding Father’s knew this well and put into place checks and balances to protect against it. As long as MZ holds to his  ‘people are basically good’ view, FB and its users will be at risk and will continue to be exploited.

In contrast, Apple’s firm stance on not having encryption keys or other ‘back door’ methods for accessing the encrypted data on an iPhone recognizes this nature. If they have the method, someone sometime will use it for ‘evil’.

And, from another reader who, like Michael Jones, has worked at Google—and more recently than Jones has:

One thing I want to put out there is that Facebook was clearly, structurally playing fast and loose with people's personal information.

The News Feed, for example, used not to exist. It was a new use of Facebook users' personal information; while it wasn't a violation of privacy, it was a violation of expectations because what you did on Facebook was suddenly being actively pushed out to your friends. And people got upset. But people got over it and eventually seemed to agree with Facebook that it was better to have the News Feed.

Facebook's Graph API—what Cambridge Analytica used to siphon out everyone's data—was, to us in the industry, an audacious disclosure of your friends' personal data without their informed consent. But again, no one seemed to care.

Facebook learned from this that people don't care. But it wasn't just Facebook that learned this. Everyone in the industry could see it and decided "full speed ahead," which meant different things to different industry participants. Now, 12 years later, Facebook may be paying some price. But for all those years, if you wanted to promote data silos like Jones described, that's what you had to fight.

I'd further add [also for quotation] that all those years ago, Facebook was a small upstart while Google was already a behemoth in internet advertising. The fact that Google would not share or sell your personal data was a real challenge for Facebook, and Facebook had to work continuously and cleverly to build a powerful ad system without Google data. Which they did.

What disturbs me is that, if we weren't able to predict where we were going to end up on this road (and we weren't), and if we don't particularly like where we are (and many of us don't), it's far from clear how to regulate in a way that addresses people's real concerns and doesn't just entrench the existing players

Now, back to Michael Jones, for a follow-up to his original message and a response (beyond his Twitter dispatches) to his Google/Facebook comparisons:

Thanks for reaching out for a follow-up. I confess that I under-appreciated how strongly people feel about this topic.

When I wrote to you I focused on the "what the risk is" and not so much on the instructions on how to accomplish harm in the Facebook apps and advertising models. As I've seen in questions in the Twitter feed—doubts and disagreements that I answered clearly and conclusively—there are some people who misunderstood the precise phrasing.

This is my fault as I should have been more clear and more detailed to prevent misunderstanding. Glad to have a chance to clear this up, though I fear that the details will cause most reader's eyes to glaze over.

Based on the Twitter feedback, readers including Facebook employees realize that the Facebook API and the apps that live on it do indeed have access to private user information. Apps had more freedom in the past but still retain enough risk of data exfiltration by app developers that Facebook is having to scrutinize them one by one looking for other than the publicly acknowledged abusers among their millions of app developer customers. Even if it was just Cambridge Analytica, Facebook's business model was sufficient to start and enflame the firestorm that grows day by day in Congress, Wall Street, and around the world.

Where there seems to be disagreement with what I wrote to you, and very strong disagreement indeed, is in the realm of Facebook ads. (I wrote "...the advertiser can find out...") As stated above, I should have realized that with their app ecosystem under such investigation and suspicion, the need to protect the sacred cash-cow of advertising would expose me to the strongest counterpunch. Although I answered this in tiny Twitter dribbles, let me say it as clearly as I can here for the benefit of those who care enough to understand the mechanisms.

At Google (and Bing and Amazon and most every other web-advertising company) the user searches for "Cabo San Lucas" at Google or clicks on a pizza oven at Amazon and this is taken as intent on the part of the search engine, and is used to match that user (whoever they may be) with a matching ad.

This is called targeting. Google will give natural results and also information on hotels and flights, Amazon will tell you that customers who bought a pizza oven also bought a Pizza Cutter or a Pizza Tray. This is natural and comfortable because it correlates with your immediate expressed intent.

It is a different situation at Facebook because users are not generally directly searching for things this way. Instead, Facebook came up with a different (and very clever!) way to target users to match them with appropriate advertising.

What they did was to realize that since they know your name and quite a bit more (city, state, address, education, past employers, etc.) from things that you've entered yourself or people that you know (friends and family) that that is enough to identify you in the many external commercial databases of credit history, purchases, property tax records, court cases, or other kinds of reports about people that can be disambiguated ("yes, it is definitely THAT James Fallows") based on the kinds of information that Facebook knows about you internally.

The result is the finest intelligence operation on Earth. Facebook knows more about you than the FBI could in 52,000 categories from "breastfeeding in public" to “total liquid investible assets $1-$24,999" and "Away from family."

Now where the clever part comes in is that Facebook allows advertisers to micro-target ads, not just to Facebook users that "Hate Noise" but to "Hate Noise AND breastfeeding AND frequent transactor at lower cost department or dollar stores." This means that "women looking for inexpensive quiet leaf-blowers" can be targeted with extraordinary specificity. Genius!

However, and this is where I part with the people who argued about my "Facebook Ads Leak Private Information" belief, the story is a little more nuanced than just the extraordinary specificity that has made Facebook so wealthy. To understand this we must explain the some nuance about web advertising and a little about binary arithmetic.

Imagine a quiet leaf blower ad, a photo and some text about the "QuietClean 1000" that is on my website and for which I contract ad display service from Facebook. Facebook chooses who to show the ad to, people click on it, and they are sent to a landing page on my website.

All I know is that they responded to the ad. I don't know their gender, sexuality, credit history, or anything else. This first-level cleanliness underlies the arguments of those who will write to disagree with me. That's fine, but that position is short of obligation number two in the courtroom's "the truth, the whole truth, and nothing but the truth."

The whole truth comes in when I create two ads, looking just alike maybe, but linked to two different places on my website. For add #1, I do a micro-targeting of "only show to males" and for #2 it is "only show to females." Now, when Facebook uses its data (user entered, from friend links, from licensed credit data, etc.) the men are sent to link #1 and the females to #2. That means that as a Facebook user clicks on the ad, I have been told by Facebook "here comes a woman" before the QuietClean ad even appears on her screen.

Because computer ads are not like physical ads, it is not much harder to have 8 or 1024, or 16,777,216 different ads that differ only in the web address on my website, each of which is designed as follows:

When "sex=male" AND "bixesual=NO" => Link to Quiet1

When "sex=male" AND "bixesual=YES" => Link to Quiet2

When "sex=female" AND "bixesual=NO" => Link to Quiet3

When "sex=female" AND "bixesual=YES" => Link to Quiet4

This is termed an ad campaign. (You will often see this as one page with a web tag starting with a "?") It has the remarkable property that everyone who clicks on a QuietClean ad (as micro-targeted by Facebook) has their gender and an aspect of sexuality sent to me before they even see the ad (which I serve from my website).

Each yes or no question doubles the number of individual ads in the campaign. 1 question = 2 ads, 2 questions = 4, 3 = 8, ... and 24 questions = 16,777,216 ads, all of which look alike but are my advertiser part of paying Facebook to tell me if you're looking for a new partner, have been bankrupt, your income range, and maybe someday, medical history. This would be tedious in physical ads, but just a day's work for programmers working for advertisers whose ads you see on Facebook.

As you see above, the 1 or 24 or more facts about you are shared with the advertiser. Facebook makes billions of dollars just this way, in part because advertisers enjoy being able to leech out the otherwise inaccessible private information in your credit report and other information from the unique interest categories that Facebook has exposed this way. (You can download the list of what they know from ProPublica, one of which is "reads The Atlantic" by the way.)

As one former Facebook employee and ad-system said yesterday in your Twitter feed, "yes, but that's less than one click in a thousand that does that." Maybe so! The exact rates are secret, but Harvey Weinstein is not in trouble because every interaction was a grope or rape. If one in a thousand ad clicks in Facebook leaks very private and personally-risky information, that would still be the greatest leak of private information in human history. Keep this in mind if someone writes to say that I am factually and completely wrong, even if might be Mr. Zuckerberg himself. He may believe what he's saying, but I will gladly explain the above to him so that he can know and testify to the whole truth.

Another ex-Facebook respondent said, "well, maybe so, but Google's just as guilty." Another said my description was from an earlier time when honor and respect were greater at Google.

They may be right, though I hope not. Even though I was the official explainer of these kinds of things for years while sharing an office with Eric Schmidt and discussing right and wrong with Larry Page (and admiring both men from up close in an honest appraisal), I left Google a few years ago not long after Eric passed the CEO baton to Larry. Things could have changed and as I told one of your followers, "I can only speak for what I saw in my decade there."

Thanks to all. As for Zuckerberg and the Congress, let the truth prevail.