How concerned, really, should ordinary computer users be about the evidence we leave behind when we browse, shop, communicate, and amuse ourselves on the Internet? Early this year, as controversy built over the government’s warrantless surveillance of telephone and computer messages, I asked a number of technology experts whether they were worried about their own privacy, and what steps, if any, they would recommend to the nonexpert computing public. I found strong agreement among them about the scale and nature of the problem, and a surprising emphasis on the need and the right way to deal with it.
The main thing the experts said they know, and the public probably doesn’t, is how completely modern life has shifted to an “on-the-record” basis. You can drop a letter in a mailbox without a return address and still expect to have it delivered. But that is about the only form of untraceable communication left. Most people understand this change to some degree. As long ago as the mid-1980s, a major twist in the plot of Scott Turow’s Presumed Innocent involved computerized logs of calls made and received. Today a novel could be based on the details preserved in a single credit-card statement.
What most of us might not grasp is just how many activities can now be logged and stored. “To a large extent, we’ve willingly sacrificed our privacy for the conveniences of the Internet age,” says Richard Forno, principal consultant with the information security firm KRvW Associates. OnStar and similar services keep track of locations from which customers make calls. This can be a lifesaver in an emergency, but it also creates a record. Amazon and other online retailers can refine their sense of what we’re looking for by analyzing permanent records of past purchases. We avoid the nuisance of reregistering each time we visit favorite sites, notably newspaper sites, by permitting (even if inadvertently) our computers to store “cookies,” the small files that are created by the site we’ve visited and identify us when we return.
“There are three big categories” of possible intrusions into people’s online privacy, according to Kevin Bankston, a lawyer with the Electronic Frontier Foundation, or EFF, in San Francisco. These are “what you store on your own computer, what other people are storing about you, and what’s actually being captured or overheard in real time, by surveillance.”
The first problem—cookies, old files, unfortunate browsing histories, and other potentially compromising data left on a machine—is the easiest for users to control. Nicole Wong, an associate general counsel at Google, made me burst out laughing with her preposterously wholesome illustration of how awkward situations might arise. “Suppose a husband was shopping online for his wife’s birthday present, and he didn’t want to spoil the surprise by having her see the sites he had visited—” (When I interjected “Come on!” she replied, “I’ve been interviewed before.”) For instance, Internet Explorer, Firefox, and other major browsers can be set up not to retain cookies, temporary files, or histories of sites visited. This solves the problem, of course, only for family members or others who might want to peruse the contents of your hard drive.
Most people I spoke with said that the third problem—the highly publicized threat of eavesdropping by a Big Brotherish state—might bother them as a policy matter, but was not an active personal worry. EFF’s Kevin Bankston pointed out that users could go far toward protecting themselves by encrypting their communications with powerful, relatively easily installed utilities like PGP (“Pretty Good Privacy”), a program that sells for $99 and up at pgp.com, and GPG, an open-source counterpart available for free at gnupg.org. “We don’t know the [National Security Agency]’s code-breaking capabilities,” Bankston said, “but [any sort of encryption] would certainly slow them down.”
It was the second category—the inexorable pileup of information on a variety of Web sites—that all of the experts identified as the major long-term threat to a user’s privacy. “Data is being collected that was never collected before, that should not be collected now, and that cannot be protected in the long run,” says Marc Rotenberg, executive director of the Electronic Privacy Information Center, a civil-liberties advocacy group based in Washington. The technical developments that make this possible cannot easily be undone, but the business policies could be.
Nearly every interaction in today’s digital life is traceable. Cell-phone calls are of course routed to and from particular handsets—actually, to unique Subscriber Identity Modules, or SIM cards, inside the handset, each associated with a particular customer. E‑mail messages, Web search requests, music-download orders, and all other signals sent from a computer are marked with the “IP address,” or Internet Protocol address, of the machine that sent them. This is a number something like “126.96.36.199” that identifies your particular computer amid the vastness of the Internet. IP addresses differ from phone numbers in many ways: some are permanently assigned to a given machine, some are reassigned session by session, some are shared on local networks. But in the end, their function is similar. They let other people reach you, and tell others who is trying to reach them.
Every query you send to Google contains both the terms you’re looking for and the IP address of your machine. Every site you visit can register the fact that someone from your IP address was there. (You can delete the cookies placed by ViolentOverthrowOfTheGovernment .org on your machine; you can’t do anything about the site’s own logs.) Every time you choose a story to read on an online news site the story you read, and your IP address, can be recorded in the site’s log. Every blog posting or comment, every email sent even under a fictitious account name, every item bid on through eBay or bought from an online merchant, every request for a map to be downloaded or a picture viewed—they all carry an IP address.
Like a phone number, an IP address is merely digits, without a person’s name attached. But the connection between IP addresses and real people is almost as close as with telephone accounts. Anyone who pays for home Internet service, whether cable, DSL, or humble dial-up, has been assigned an IP address by a company that also knows the customer’s name. Most for-pay WiFi hot spots also require customers to create accounts with a real name and billing address. You can still go online without revealing your identity, if you’re willing to live like a fugitive: paying cash to use Internet cafés, sticking strictly to free, public WiFi spots, libraries, or schools. But for most people this is a chore.
The main privacy concerns about IP addresses stem from one business decision: the companies that collect and own the information traceable to them have decided to retain it more or less forever.
Why would Google, which receives hundreds of millions of search requests per day, warehouse every one of them, with IP address attached? Because it can. Disk storage has become essentially free. Also it wants to, because for Google and most other online firms, real-world transaction data is the most precious form of market intelligence. Nicole Wong, of Google, gave me the standard reasons why this ever- expanding hoard of data is best for the company and for its users as well. It may be retained to help firms investigate and remedy “click fraud,” or invalid clicks on advertisements, in which it matters (for reasons not worth going into here) how many requests come from each unique address. It helps show the “geolocation” of all requests, “which lets us address a number of issues at a geographic level, such as where there might be too much latency [slowness in Google’s response to a query], for example, in Africa.”
The real reason, for firms from Google and Yahoo to Amazon, eBay, and Expedia, is that they are all in an endless struggle to entice more people to spend more time on their sites, so they can sell more advertising. This whole effort, they believe, depends crucially on their ability to “personalize” their services. “The information about you is gold, and it’s used for ever more perfect marketing to you,” Kevin Bankston told me. “Nothing will change that unless there is a law to force them to stop.”
For users, “the fear is Panopticon,” says Lawrence Lessig, of Stanford Law School, referring to the unseen but all-seeing observer in a prison watchtower once proposed by Jeremy Bent‑ ham. “The critical point to recognize is that there simply is no such thing as anonymity on the Internet. That is not because of its technical architecture. It is because of the business model of these companies, which depends on gathering and storing as much data about the customer as you possibly can.”
What’s the potential harm? Every person I spoke with gave an example. A few were political, but most concerned the drawbacks of life in which everyone is on the record, all the time. A spouse in a divorce case might ask for Web- browsing histories to show the other spouse’s peccadilloes or peculiar interests. Vetting applicants for jobs—or nominees for official positions—could become even more intrusive than it is already, and even less forgiving of adventure or eccentricity, in an extension of today’s “just Google him” effect.
No one suggests that an online firm will deliberately disgorge everything it knows about you. Technically, Google could list every IP address that has ever launched a search for “underaged hotties” or “how to make a bomb.” Commercially, that would be suicide. Since the Googles and Yahoos need users’ trust in order to keep getting data, they, like banks or credit-card companies, have a strong incentive to compete on trustworthiness. But the long-run fear is that as unprecedented amounts of personal information pile up, all of it linked by IP addresses, more will ultimately be used.
So what are we supposed to do about it? The answers I got covered a very wide range, with one area of consensus that made me think differently about how hard individual users should—or should not—try to protect their own secrets.
At one extreme is the approach that Richard Forno describes as “wearing tinfoil underwear.” Marvelous tools of disguise exist, starting with encryption software for e-mail. Perhaps the most powerful one, called Tor, can be found at the Electronic Frontier Foundation Web site, tor.eff.org. Originally funded by the Navy, the system effectively conceals a user’s IP address by bouncing every Web query among routers around the world, making it harder to trace back to its origin. Tor is free but somewhat tricky to install (I have succeeded, but it took time), and it slows Web response time noticeably. I would use it only if I were working on a project I really wanted to keep under cover.
There are other, more modest protective measures. Politicians and CEOs should think twice about doing anything they wouldn’t want to see on the front page of a newspaper. Everyone else should think twice before sending e-mail they would not want to see broadly forwarded. (I get and send more e-mail than ever for routine business, but stick to the phone or meetings for anything sensitive.) To keep your computer from piling up data you’d rather not have it store, you can configure your Web browser to reject all cookies, or to ask you before it accepts any. (In IE, you find this via Tools/Internet Settings/Privacy. In Firefox, via Tools/Options/Privacy/Cookies.) Doing without cookies means not being able to use some sites or services at all, for instance Gmail, plus manually logging into other sites every single time. A more moderate step is to have the browser accept cookies but purge them whenever you close the browser.
On the other extreme is the approach Lawrence Lessig takes. “I don’t do anything” about privacy, he says. “I think there is no way to hide. I just live life thinking everything is in the open.” Esther Dyson, of CNET, says something similar. “The short answer is: Nothing,” she replied by e-mail when I asked what concealing steps she takes. “For a while I tried flagging every cookie I got, just for fun, but I let them all go through anyway, so eventually I stopped.”
Mitchell Kapor, the founder of Lotus, who now directs the Open Source Applications Foundation, does take a few protective measures. When using an Internet café, he doesn’t log on to PayPal, his credit-card account, or any other site that involves his finances, just in case some keystroke-capture program has been installed. He wasn’t comfortable using Gmail as one of his personal e-mail accounts until he grilled Google officials and determined that they “took privacy and security seriously when they store mail.” Kapor said that what changed his mind was evidence that Google understands how important a reputation for guarding privacy is to the company’s prospects. “They do take steps,” he told me, “to make sure that Google employees don’t just satisfy their curiosity by looking at people’s e-mail, as well as making sure that if you delete the account, they clobber all copies everywhere, including the backups.”
Between these alternatives—the hypercautious approach of encoding all e-mail and the fatalistic belief that Big Brother will see everything anyway—lay the surprise in what I heard from these informants. This was the idea that legislation—the intrusion of the stodgy old pre-digital government—offers modern computer users their best hope.
“When your choices are the tinfoil or doing nothing, that’s not right,” Lessig says. “I would rather think about how we could actually increase privacy without giving up the versatility of the Internet.”
For instance, a future law might require Google and other companies to strip specific IP addresses from records of searching or browsing activity that they intended to store for more than a brief period. This would be a balancing act similar to the creation of the “do-not-call” list for telemarketers. It would preserve the legitimate commercial value of aggregate data about Internet use, while protecting individuals if the records were dredged up in legal proceedings—or simply lost, stolen, or exposed through negligence or incompetence. TiVo already applies such a policy. It keeps records of aggregate viewing patterns, which is how it knows that the Janet Jackson breast exposure, from the 2004 Super Bowl, is the most replayed event in TiVo history—but it removes all evidence of which specific customers have viewed or replayed which shows.
Nicole Wong unsurprisingly rejects the idea of controls on her own company. But she also suggests that the real privacy firewall, or at least wall, will be built through legislation, rather than ever warier behavior by individual users or more restraint by companies. “I don’t think that the fix for user privacy is companies providing less service,” she told me. “At the systemic level, the solution is to limit what personal data the government can ask for”—and by extension to limit what information banks, potential employers, divorcing spouses, and other potential snoops can find out.
“This is a big, macro public-policy issue about the design of our infrastructure,” Marc Rotenberg, of EPIC, says. “It involves payment systems, communications networks, identification, transportation toll design. It’s not something that will be solved by ‘privacy survivalism’—anonymizers, dark glasses, people paying for everything in cash. Collective problems require collective solutions.”
I feel worse than I did when I started this project, because I’ve realized how fully exposed my whole life is. I feel better when I think that companies could be required to purge data every so often, or to store data in a way that makes it hard to link one person’s name and IP address to the details of what he or she has done online. Then I remember that Congress would need to concentrate long enough to enact this change in a thoughtful, far-reaching way with minimal glitches, and I really start worrying.