I'm Being Followed: How Googleand 104 Other CompaniesAre Tracking Me on the Web
Who are these companies and what do they want from me? A voyage into the invisible business that funds the web.
This morning, if you opened your browser and went to NYTimes.com, an amazing thing happened in the milliseconds between your click and when the news about North Korea and James Murdoch appeared on your screen. Data from this single visit was sent to 10 different companies, including Microsoft and Google subsidiaries, a gaggle of traffic-logging sites, and other, smaller ad firms. Nearly instantaneously, these companies can log your visit, place ads tailored for your eyes specifically, and add to the ever-growing online file about you.
There's nothing necessarily sinister about this subterranean data exchange: this is, after all, the advertising ecosystem that supports free online content. All the data lets advertisers tune their ads, and the rest of the information logging lets them measure how well things are actually working. And I do not mean to pick on The New York Times. While visiting the Huffington Post or The Atlantic or Business Insider, the same process happens to a greater or lesser degree. Every move you make on the Internet is worth some tiny amount to someone, and a panoply of companies want to make sure that no step along your Internet journey goes unmonetized.
Even if you're generally familiar with the idea of data collection for targeted advertising, the number and variety of these data collectors will probably astonish you. Allow me to introduce the list of companies that tracked my movements on the Internet in one recent 36-hour period of standard web surfing: Acerno. Adara Media. Adblade. Adbrite. ADC Onion. Adchemy. ADiFY. AdMeld. Adtech. Aggregate Knowledge. AlmondNet. Aperture. AppNexus. Atlas. Audience Science.
And that's just the As. My complete list includes 105 companies, and there are dozens more than that in existence. You, too, could compile your own list using Mozilla's tool, Collusion, which records the companies that are capturing data about you, or more precisely, your digital self.
While the big names -- Google, Microsoft, Facebook, Yahoo, etc. -- show up in this catalog, the bulk of it is composed of smaller data and advertising businesses that form a shadow web of companies that want to help show you advertising that you're more likely to click on and products that you're more likely to purchase.
To be clear, these companies gather data without attaching it to your name; they use that data to show you ads you're statistically more likely to click. That's the game, and there is substantial money in it.
As users, we move through our Internet experiences unaware of the churning subterranean machines powering our web pages with their cookies and pixels trackers, their tracking code and databases. We shop for wedding caterers and suddenly see ring ads appear on random web pages we're visiting. We sometimes think the ads following us around the Internet are "creepy." We sometimes feel watched. Does it matter? We don't really know what to think.
The issues the industry raises did not exist when Ronald Reagan was president and were only in nascent form when the Twin Towers fell. These are phenomena of our time and while there are many antecedent forms of advertising, never before in the history of human existence has so much data been gathered about so many people for the sole purpose of selling them ads.
"The best minds of my generation are thinking about how to make people click ads," my old friend and early Facebook employee Jeff Hammerbacher once said. "That sucks," he added. But increasingly I think these issues -- how we move "freely" online, or more properly, how we pay one way or another -- are actually the leading edge of a much bigger discussion about the relationship between our digital and physical selves. I don't mean theoretically or psychologically. I mean that the norms established to improve how often people click ads may end up determining who you are when viewed by a bank or a romantic partner or a retailer who sells shoes.
Already, the web sites you visit reshape themselves before you like a carnivorous school of fish, and this is only the beginning. Right now, a huge chunk of what you've ever looked at on the Internet is sitting in databases all across the world. The line separating all that it might say about you, good or bad, is as thin as the letters of your name. If and when that wall breaks down, the numbers may overwhelm the name. The unconsciously created profile may mean more than the examined self I've sought to build.
Most privacy debates have been couched in technical. We read about how Google bypassed Safari's privacy settings, whatever those were. Or we read the details about how Facebook tracks you with those friendly Like buttons. Behind the details, however, are a tangle of philosophical issues that are at the heart of the struggle between privacy advocates and online advertising companies: What is anonymity? What is identity? How similar are humans and machines? This essay is an attempt to think through those questions.
The bad news is that people haven't taken control of the data that's being collected and traded about them. The good news is that -- in a quite literal sense -- simply thinking differently about this advertising business can change the way that it works. After all, if you take these companies at their word, they exist to serve users as much as to serve their clients.
Before we get too deep, let's talk about the reality of the online display advertising industry. (That means, essentially, all the ads not associated with a web search.) There are a dizzying array of companies and services who can all make a buck by helping advertisers target you a teensy, weensy bit better than the next guy. These are companies that must prove themselves quite narrowly in measurable revenue and profit; the competition is fierce, the prize is large, and the strategies are ever-changing. Here's the coral-reef level diversity of corporate life in display advertising, as cataloged by Luma Partners a little over a year ago:
Don't get too caught up in all of that, though. There are three basic categories: Essentially, there are people who help the buyers (on the left), people who help the sellers (on the right), and a whole lot of people who assist either side with more data or faster service or better measurement. Let's zoom in on three of them -- just from the As -- to give you an idea of the kinds of outfits we're talking about.
Let's look at three companies from our list of As. Adnetik is a standard targeting company that uses real-time bidding. They can offer targeted ads based on how users act (behavioral), who they are (demographic), where they live (geographic), and who they seem like online (lookalike), as well as something they call "social proximity." They also give advertisers the ability to choose the types of sites on which their ads will run based on "parameters like publisher brand equity, contextual relevance to the advertiser, brand safety, level of ad clutter and content quality."
It's worth noting how different this practice is from traditional advertising. The social contract between advertisers and publications used to be that publications gathered particular types of people into something called an audience, then advertisers purchased ads in that publication to reach that audience. There was an art to it, and some publications had cachet while others didn't. Online advertising upends all that: Now you can buy the audience without the publication. You want an Atlantic reader? Great! Some ad network can sell you someone who has been to The Atlantic but is now reading about hand lotion at KnowYourHandLotions.com. And they'll sell you that set of eyeballs for a fifth of the price. You can bid in real-time on a set of those eyeballs across millions of sites without ever talking to an advertising salesperson. (Of course, such a tradeoff has costs, which we'll see soon.)
Adnetik also offers a service called "retargeting" that another A-company, AdRoll, specializes in. Here's how it works. Let's say you're an online shoe merchant. Someone comes to your store but doesn't purchase anything. While they're there, you drop a cookie on them. Thereafter you can target ads to them, knowing that they're at least mildly interested. Even better, you can drop cookies on everyone who comes to look at shoes and then watch to see who comes back to buy. Those people become your training data, and soon you're only "retargeting" those people with a data profile that indicates that they're likely to purchase something from you eventually. It's slick, especially if people don't notice that the pairs of shoes they found the willpower not to purchase just happen to be showing up on their favorite gardening sites.
There are many powerful things you can do once you've got data on a user, so the big worries for online advertisers shift to the inventory itself. Purchasing a page in a magazine is a process through which advertisers have significant control; but these types of online ads could conceivably run anywhere. After all, many ad networks need all the inventory they can get, so they sign up all kinds of content providers. And that's where our third company comes into play.
AdExpose, now a comScore company, watches where and how ads are run to determine if their purchasers got their money's worth. "Up to 80% of interactive ads are sold and resold through third parties," they put it on their website. "This daisychaining brings down the value of online ads and advertisers don't always know where their ads have run." To solve that problem, AdExpose claims to provide independent verification of an ad's placement.
All three companies want to know as much about me and what's on my screen as they possibly can, although they have different reasons for their interest. None of them seem like evil companies, nor are they singular companies. Like much of this industry, they seem to believe in what they're doing. They deliver more relevant advertising to consumers and that makes more money for companies. They are simply tools to improve the grip strength of the invisible hand.
And yet, the revelation that 105 different outfits were collecting and presumably selling data about me on the Internet gives me pause. It's not just Google or Facebook or Yahoo. There are literally dozens and dozens of these companies and the average user has no idea what they do or how they work. We just know that for some reason, at one point or another, an organization dropped a cookie on us and have created a file on some server, steadily accumulating clicks and habits that will eventually be mined and marketed.
The online advertising industry argues that technology is changing so rapidly that regulation is not the answer to my queasiness about all that data going off to who-knows-where. The problem, however, is that the industry's version of self-regulation is not one that most people would expect or agree with, as I found out myself.
After running Collusion for a few days, I wanted to see if there was an easy method to stop data collection. Naively, I went to the self-regulatory site run by the Network Advertising Initiative and completed their "Opt Out" form. I did so for the dozens of companies listed and I would say that it was a simple and nominally effective process. That said, I wasn't sure if data would stop being collected on me or not. The site itself does not say that data collection will stop, but it's also not clear that data collection will continue. In fact, the overview of NAI's principles freely mixes talk about how the organization's code "limits the types of data that member companies can use" with information about the opt-out process.
After opting out, I went back to Collusion to see if companies were still tracking me. I found that many, many companies appeared to be logging data for me. According to Mozilla, the current version of Collusion does not allow me to see precisely what companies are still tracking, but Stanford researchers using Collusion found that at least some companies continue to collect data. All that I had "opted out" of was receiving targeted ads, not data collection. There is no way, through the companies' own self-regulatory apparatus, to stop being tracked online. None.
After those Stanford researchers posted their results to a university blog, they received a sharp response from the NAI's then-chief, Chuck Curran.
In essence, Curran argued that users do not have the right to *not* be tracked. "We've long recognized that consumers should be provided a choice about whether data about their likely interests can be used to make their ads more relevant," he wrote. "But the NAI code also recognizes that companies sometimes need to continue to collect data for operational reasons that are separate from ad targeting based on a user's online behavior."
Companies "need to continue to collect data," but that contrasts directly with users desire "not to be tracked." The only right that online advertisers are willing to give users is the ability not to have ads served to them based on their web histories. Curran himself admits this: "There is a vital distinction between limiting the use of online data for ad targeting, and banning data collection outright."
But based on the scant survey and anecdotal data that we have available, when users opt out preventing data collection is *precisely* what they are after.
In preliminary results from a survey conducted last year, Aleecia McDonald, a fellow at Stanford Center for Internet and Society, found that users expected a lot more from the current set of tools than those tools deliver. The largest percentage of her survey group (34 percent) who looked at the NAI's opt-out page thought that it was "a website that lets you tell companies not to collect data about you." For browser-based "Do Not Track" tools, a full 61 percent of respondents expected that if they clicked such a button, no data would be collected about them.
Do Not Track tools have become a major point of contention. The idea is that if you enable one in your browser, when you arrive at The New York Times, you send a herald out ahead of you that says, "Do not collect data about me." Members of the NAI have agreed, in principle, to follow the DNT provisions, but now the debate has shifted to the details.
There is a fascinating scrum over what "Do Not Track" tools should do and what orders websites will have to respect from users. The Digital Advertising Alliance (of which the NAI is a part), the Federal Trade Commission, W3C, the Internet Advertising Bureau (also part of the DAA), and privacy researchers at academic institutions are all involved. In November, the DAA put out a new set of principles that contain some good ideas like the prohibition of "collection, use or transfer of Internet surfing data across Websites for determination of a consumer's eligibility for employment, credit standing, healthcare treatment and insurance."
This week, the White House seemed to side with privacy advocates who want to limit collection, not just uses. Its Consumer Privacy Bill of Rights pushes companies to allow users to "exercise control over what personal data companies collect from them and how they use it." The DAA heralded its own participation in the White House process, though even it noted this is the beginning of a long journey.
There has been a clear and real philosophical difference between the advertisers and regulators representing web users. On the one hand, as Stanford privacy researcher Jonathan Mayer put it, "Many stakeholders on online privacy, including U.S. and EU regulators, have repeatedly emphasized that effective consumer control necessitates restrictions on the collection of information, not just prohibitions on specific uses of information." But advertisers want to keep collecting as much data as they can as long as they promise to not to use it to target advertising. That's why the NAI opt-out program works like it does.
Let's not linger too long on the technical implementation here: there may be some topics around which compromises can be found. Some definition of "Do Not Track" that suits industry and privacy people may be crafted. Various issues related to differences between first and third-party cookies may be resolved. But the battle over data collection and ad targeting goes much deeper than the tactical, technical issues that dominate the discussion.
Let's assume good faith on behalf of advertising companies and confront the core issue head on: Should users be able to stop data collection, even if companies aren't doing anything "bad" with it? Should that be a right as the White House contends, and more importantly, why?
Companies' ability to track people online has significantly outpaced the cultural norms and expectations of privacy. This is not because online companies are worse than their offline counterparts, but rather because what they can do is so, so different. We don't have a language for talking about how these companies function or how our society should deal with them.
The word you hear over and over and over is that targeted ads can be "creepy." It even crops up in the academic literature, despite its vague meaning in this context. My intuition is that we use the word "creepy" precisely because it is an indeterminate word. It connotes that tingling-back-of-the-neck feeling, but not necessarily more than that. The creepy feeling is a sign to pay attention to a possibly harmful phenomenon. But we can't sort our feelings into categories -- dangerous or harmless -- because we don't actually know what's going to happen with all the data that's being collected.
Not only are there more than 100 companies that are collecting data on us, making it practically impossible to sort good from bad, but there are key unresolved issues about how we relate to our digital selves and the machines through which they are expressed.
At the heart of the problem is that we increasingly live two lives: a physical one in which your name, social security number, passport number, and driver's license are your main identity markers, and one digital, in which you have dozens of identity markers, which are known to you and me as cookies. These markers allow data gatherers to keep tabs on you without your name. Those cookie numbers, which are known only to the entities that assigned them to you, are persistent markers of who you are, but they remain unattached to your physical identity through your name. There is a (thin) wall between the self that buys health insurance and the self that searches for health-related information online.
For real-time advertising bidding, in which audiences are being served ads that were purchased milliseconds *after* users arrive at a webpage, ad services "match cookies," so that both sides know who a user is. While that information may not be stored by both companies, i.e. it's not added to a user's persistent file, it means that the walls between online data selves are falling away quickly. Everyone can know who you are, even if they call you by a different number.
Furthermore, many companies are just out there collecting data to sell to other companies. Anyone can combine multiple databases together into a fully fleshed out digital portrait. As a Wall Street Journal investigation put it, data companies are "transforming the Internet into a place where people are becoming anonymous in name only." Joe Turow, who recently published a book on online privacy, had even stronger words.
If a company can follow your behavior in the digital environment -- an environment that potentially includes your mobile phone and television set -- its claim that you are "anonymous" is meaningless. That is particularly true when firms intermittently add off-line information such as shopping patterns and the value of your house to their online data and then simply strip the name and address to make it "anonymous." It matters little if your name is John Smith, Yesh Mispar, or 3211466. The persistence of information about you will lead firms to act based on what they know, share, and care about you, whether you know it is happening or not.
Militating against this collapse of privacy is a protection embedded in the very nature of the online advertising system. No person could ever actually look over the world's web tracks. It would be too expensive and even if you had all the human laborers in the world, they couldn't do the math fast enough to constantly recalculate web surfers' value to advertisers. So, machines are the ones that do all of the work.
When new technologies come up against our expectations of privacy, I think it's helpful to make a real-world analogy. But we just do not have an adequate understanding of anonymity in a world where machines can parse all of our behavior without human oversight. Most obviously, with the machine, you have more privacy than if a person were watching your clickstreams, picking up collateral knowledge. A human could easily apply analytical reasoning skills to figure out who you were. And any human could use this data for unauthorized purposes. With our data-driven advertising world, we are relying on machines' current dumbness and inability to "know too much."
This is a double-edged sword. The current levels of machine intelligence insulate us from privacy catastrophe, so we let data be collected about us. But we know that this data is not going away and yet machine intelligence is growing rapidly. The results of this process are ineluctable. Left to their own devices, ad tracking firms will eventually be able to connect your various data selves. And then they will break down the name wall, if they are allowed to.
Your visit to this story probably generated data for 13 companies through our website. The great downside to this beautiful, free web that we have is that you have to sell your digital self in order to access it. If you'd like to stop data collection, take a look at Do Not Track Plus. It goes beyond Collusion and browser based controls in blocking data collection outright.
But I am ultimately unclear what I think about using these tools. Rhetorically, they imply that there will be technological solutions to these data collection problems. Undoubtedly, tech elites will use them. The problem is the vast majority of Internet users will never know what's churning beneath their browsers. And the advertising lobby is explicitly opposed to setting browser defaults for higher levels of "Do Not Track" privacy. There will be nothing to protect them from unwittingly giving away vast amounts of data about who they are.
On the other hand, these are the tools that allow websites to eke out a tiny bit more money than they otherwise would. I am all too aware of how difficult it is for media businesses to survive in this new environment. Sure, we could all throw up paywalls and try to make a lot more money from a lot fewer readers. But that would destroy what makes the web the unique resource in human history that it is. I want to keep the Internet healthy, which really does mean keeping money flowing from advertising.
I wish there were more obvious villains in this story. The saving grace may end up being that as companies go to more obtrusive and higher production value ads, targeting may become ineffective. Avi Goldfarb of Rotman School of Management and Catherine Tucker of MIT's Sloan School found last year that the big, obtrusive ads that marketers love do not work better with targeting, but worse.
"Ads that match both website content are obtrusive do worse at increasing purchase intent than ads that do only one or the other," they wrote in a 2011 Marketing Science journal paper. "This failure appears to be related to privacy concerns: the negative effect of combining targeting with obtrusiveness is strongest for people who refuse to give their income and for categories where privacy matters most."
Perhaps there are natural limits to what data targeting can do for advertisers and when we look back in 10 years at why data collection practices changed, it will not be because of regulation or self-regulation or a user uprising. No, it will be because the best ads could not be targeted. It will be because the whole idea did not work and the best minds of the next generation will turn their attention to something else.