How concerned, really, should ordinary computer users be about the evidence we leave behind when we browse, shop, communicate, and amuse ourselves on the Internet? Early this year, as controversy built over the government’s warrantless surveillance of telephone and computer messages, I asked a number of technology experts whether they were worried about their own privacy, and what steps, if any, they would recommend to the nonexpert computing public. I found strong agreement among them about the scale and nature of the problem, and a surprising emphasis on the need and the right way to deal with it.
The main thing the experts said they know, and the public probably doesn’t, is how completely modern life has shifted to an “on-the-record” basis. You can drop a letter in a mailbox without a return address and still expect to have it delivered. But that is about the only form of untraceable communication left. Most people understand this change to some degree. As long ago as the mid-1980s, a major twist in the plot of Scott Turow’s Presumed Innocent involved computerized logs of calls made and received. Today a novel could be based on the details preserved in a single credit-card statement.
What most of us might not grasp is just how many activities can now be logged and stored. “To a large extent, we’ve willingly sacrificed our privacy for the conveniences of the Internet age,” says Richard Forno, principal consultant with the information security firm KRvW Associates. OnStar and similar services keep track of locations from which customers make calls. This can be a lifesaver in an emergency, but it also creates a record. Amazon and other online retailers can refine their sense of what we’re looking for by analyzing permanent records of past purchases. We avoid the nuisance of reregistering each time we visit favorite sites, notably newspaper sites, by permitting (even if inadvertently) our computers to store “cookies,” the small files that are created by the site we’ve visited and identify us when we return.
“There are three big categories” of possible intrusions into people’s online privacy, according to Kevin Bankston, a lawyer with the Electronic Frontier Foundation, or EFF, in San Francisco. These are “what you store on your own computer, what other people are storing about you, and what’s actually being captured or overheard in real time, by surveillance.”
The first problem—cookies, old files, unfortunate browsing histories, and other potentially compromising data left on a machine—is the easiest for users to control. Nicole Wong, an associate general counsel at Google, made me burst out laughing with her preposterously wholesome illustration of how awkward situations might arise. “Suppose a husband was shopping online for his wife’s birthday present, and he didn’t want to spoil the surprise by having her see the sites he had visited—” (When I interjected “Come on!” she replied, “I’ve been interviewed before.”) For instance, Internet Explorer, Firefox, and other major browsers can be set up not to retain cookies, temporary files, or histories of sites visited. This solves the problem, of course, only for family members or others who might want to peruse the contents of your hard drive.
Most people I spoke with said that the third problem—the highly publicized threat of eavesdropping by a Big Brotherish state—might bother them as a policy matter, but was not an active personal worry. EFF’s Kevin Bankston pointed out that users could go far toward protecting themselves by encrypting their communications with powerful, relatively easily installed utilities like PGP (“Pretty Good Privacy”), a program that sells for $99 and up at pgp.com, and GPG, an open-source counterpart available for free at gnupg.org. “We don’t know the [National Security Agency]’s code-breaking capabilities,” Bankston said, “but [any sort of encryption] would certainly slow them down.”
It was the second category—the inexorable pileup of information on a variety of Web sites—that all of the experts identified as the major long-term threat to a user’s privacy. “Data is being collected that was never collected before, that should not be collected now, and that cannot be protected in the long run,” says Marc Rotenberg, executive director of the Electronic Privacy Information Center, a civil-liberties advocacy group based in Washington. The technical developments that make this possible cannot easily be undone, but the business policies could be.
Nearly every interaction in today’s digital life is traceable. Cell-phone calls are of course routed to and from particular handsets—actually, to unique Subscriber Identity Modules, or SIM cards, inside the handset, each associated with a particular customer. E‑mail messages, Web search requests, music-download orders, and all other signals sent from a computer are marked with the “IP address,” or Internet Protocol address, of the machine that sent them. This is a number something like “22.214.171.124” that identifies your particular computer amid the vastness of the Internet. IP addresses differ from phone numbers in many ways: some are permanently assigned to a given machine, some are reassigned session by session, some are shared on local networks. But in the end, their function is similar. They let other people reach you, and tell others who is trying to reach them.
Every query you send to Google contains both the terms you’re looking for and the IP address of your machine. Every site you visit can register the fact that someone from your IP address was there. (You can delete the cookies placed by ViolentOverthrowOfTheGovernment .org on your machine; you can’t do anything about the site’s own logs.) Every time you choose a story to read on an online news site the story you read, and your IP address, can be recorded in the site’s log. Every blog posting or comment, every email sent even under a fictitious account name, every item bid on through eBay or bought from an online merchant, every request for a map to be downloaded or a picture viewed—they all carry an IP address.
Like a phone number, an IP address is merely digits, without a person’s name attached. But the connection between IP addresses and real people is almost as close as with telephone accounts. Anyone who pays for home Internet service, whether cable, DSL, or humble dial-up, has been assigned an IP address by a company that also knows the customer’s name. Most for-pay WiFi hot spots also require customers to create accounts with a real name and billing address. You can still go online without revealing your identity, if you’re willing to live like a fugitive: paying cash to use Internet cafés, sticking strictly to free, public WiFi spots, libraries, or schools. But for most people this is a chore.
The main privacy concerns about IP addresses stem from one business decision: the companies that collect and own the information traceable to them have decided to retain it more or less forever.
Why would Google, which receives hundreds of millions of search requests per day, warehouse every one of them, with IP address attached? Because it can. Disk storage has become essentially free. Also it wants to, because for Google and most other online firms, real-world transaction data is the most precious form of market intelligence. Nicole Wong, of Google, gave me the standard reasons why this ever- expanding hoard of data is best for the company and for its users as well. It may be retained to help firms investigate and remedy “click fraud,” or invalid clicks on advertisements, in which it matters (for reasons not worth going into here) how many requests come from each unique address. It helps show the “geolocation” of all requests, “which lets us address a number of issues at a geographic level, such as where there might be too much latency [slowness in Google’s response to a query], for example, in Africa.”