Not only are there more than 100 companies that are collecting data on us, making it practically impossible to sort good from bad, but there are key unresolved issues about how we relate to our digital selves and the machines through which they are expressed.
At the heart of the problem is that we increasingly live two lives: a physical one in which your name, social security number, passport number, and driver's license are your main identity markers, and one digital, in which you have dozens of identity markers, which are known to you and me as cookies. These markers allow data gatherers to keep tabs on you without your name. Those cookie numbers, which are known only to the entities that assigned them to you, are persistent markers of who you are, but they remain unattached to your physical identity through your name. There is a (thin) wall between the self that buys health insurance and the self that searches for health-related information online.
For real-time advertising bidding, in which audiences are being served ads that were purchased milliseconds *after* users arrive at a webpage, ad services "match cookies," so that both sides know who a user is. While that information may not be stored by both companies, i.e. it's not added to a user's persistent file, it means that the walls between online data selves are falling away quickly. Everyone can know who you are, even if they call you by a different number.
Furthermore, many companies are just out there collecting data to sell to other companies. Anyone can combine multiple databases together into a fully fleshed out digital portrait. As a Wall Street Journal investigation put it, data companies are "transforming the Internet into a place where people are becoming anonymous in name only." Joe Turow, who recently published a book on online privacy, had even stronger words.
If a company can follow your behavior in the digital environment -- an environment that potentially includes your mobile phone and television set -- its claim that you are "anonymous" is meaningless. That is particularly true when firms intermittently add off-line information such as shopping patterns and the value of your house to their online data and then simply strip the name and address to make it "anonymous." It matters little if your name is John Smith, Yesh Mispar, or 3211466. The persistence of information about you will lead firms to act based on what they know, share, and care about you, whether you know it is happening or not.
Militating against this collapse of privacy is a protection embedded in the very nature of the online advertising system. No person could ever actually look over the world's web tracks. It would be too expensive and even if you had all the human laborers in the world, they couldn't do the math fast enough to constantly recalculate web surfers' value to advertisers. So, machines are the ones that do all of the work.
When new technologies come up against our expectations of privacy, I think it's helpful to make a real-world analogy. But we just do not have an adequate understanding of anonymity in a world where machines can parse all of our behavior without human oversight. Most obviously, with the machine, you have more privacy than if a person were watching your clickstreams, picking up collateral knowledge. A human could easily apply analytical reasoning skills to figure out who you were. And any human could use this data for unauthorized purposes. With our data-driven advertising world, we are relying on machines' current dumbness and inability to "know too much."
This is a double-edged sword. The current levels of machine intelligence insulate us from privacy catastrophe, so we let data be collected about us. But we know that this data is not going away and yet machine intelligence is growing rapidly. The results of this process are ineluctable. Left to their own devices, ad tracking firms will eventually be able to connect your various data selves. And then they will break down the name wall, if they are allowed to.
Your visit to this story probably generated data for 13 companies through our website. The great downside to this beautiful, free web that we have is that you have to sell your digital self in order to access it. If you'd like to stop data collection, take a look at Do Not Track Plus. It goes beyond Collusion and browser based controls in blocking data collection outright.
But I am ultimately unclear what I think about using these tools. Rhetorically, they imply that there will be technological solutions to these data collection problems. Undoubtedly, tech elites will use them. The problem is the vast majority of Internet users will never know what's churning beneath their browsers. And the advertising lobby is explicitly opposed to setting browser defaults for higher levels of "Do Not Track" privacy. There will be nothing to protect them from unwittingly giving away vast amounts of data about who they are.
On the other hand, these are the tools that allow websites to eke out a tiny bit more money than they otherwise would. I am all too aware of how difficult it is for media businesses to survive in this new environment. Sure, we could all throw up paywalls and try to make a lot more money from a lot fewer readers. But that would destroy what makes the web the unique resource in human history that it is. I want to keep the Internet healthy, which really does mean keeping money flowing from advertising.