A barista gets burned at work, buys first-aid cream at Target, and later that day sees a Facebook ad for the same product. In another Target, someone shouts down the aisle to a companion to pick up some Red Bull; on the ride home, Instagram serves a sponsored post for the beverage. A home baker wishes aloud for a KitchenAid mixer, and moments after there’s an ad for one on his phone. Two friends are talking about recent trips to Japan, and soon after one gets hawked cheap flights there. A woman has a bottle of perfume confiscated at airport security, and upon arrival sees a Facebook ad for local perfume stores. These are just some of the many discomforting coincidences that make today’s consumers feel surveilled and violated. The causes are sometimes innocuous, and sometimes duplicitous. As more of them come to light, some will be cause for regulatory or legal remedy.
But none of this is new, nor is it unique to big tech. Online services are only accelerating the reach and impact of data-intelligence practices that stretch back decades. They have collected your personal data, with and without your permission, from employers, public records, purchases, banking activity, educational history, and hundreds more sources. They have connected it, recombined it, bought it, and sold it. Processed foods look wholesome compared to your processed data, scattered to the winds of a thousand databases. Everything you have done has been recorded, munged, and spat back at you to benefit sellers, advertisers, and the brokers who service them. It has been for a long time, and it’s not going to stop. The age of privacy nihilism is here, and it’s time to face the dark hollow of its pervasive void.
Many people still think their smartphones are listening to them in secret—recording their conversations in the background, then uploading them to Facebook or Google surreptitiously. Facebook has been accused of the practice more than others, probably because its services (including Instagram) are so popular and ads are so easy to spot. The company denies doing so every time, and researchers have shown it to be technically infeasible, too. But the idea still persists.
It persists because it feels true, and also because it is true, by the spirit if not the letter. Facebook and Google might not literally be listening in on our conversations, but they are eavesdropping on our lives. These companies have so much data, on so many people, and they can slice and dice it in so many ways that they might as well be monitoring our conversations. Traveling out of town and searching for restaurants? It’s not just that Facebook or Google knows where you are and what you’re searching for, but also if you’re a foodie or a cheapskate, if you’ve “liked” Korean hot pot or Polish pierogi, and what your demographics say about your income, and therefore your budget.
Tech companies do collect data in unexpected, and sometimes duplicitous, ways. Facebook’s Cambridge Analytica catastrophe offers one example. More recently, a report based on research at Vanderbilt University suggests that Google collects or infers vast quantities of information about its users, based on their web browsing, media use, location, purchases, and more—sometimes even absent user interaction. Location data was particularly voluminous, with Android smartphones conveying a user’s position in space more than 300 times in a 24-hour period—even if the user has turned off location history in the device’s Google settings. The study also shows that the “incognito” mode in Google’s Chrome browser, which promises to hide a user’s information from websites while browsing, still makes it possible for Google to connect those supposedly hidden visits to its own, internal profile of a user.
Revelations like these have spawned a class-action lawsuit against the company, and it’s tempting to imagine that oversight, regulation, or legal repercussions might eventually discourage or even change the way tech companies collect and manage data. This hope jibes with the ongoing “techlash” that has consumed the sector for the past year or more. But it also ignores the fact that Google and Facebook’s data hunger takes place within the context of a widespread, decades-old practice of data intelligence.
For years, companies slurped up, bought, and sold that data to hone their marketing and sales efforts. But with the rise of big tech firms, the stakes changed. Data collection’s secret grift globalized, and centralized. Now a batch of computer dorks know everything you say, do, dream, and desire—even the stuff you’re too ashamed to admit to yourself. Data brokering used to be a somewhat disgraceful, shadowy business. Now it’s mainstream. The tech companies are not ashamed of the empires they have built, or the means by which they have done so. Instead, they relish the profits they reap from the remnants you have left behind, and they do so out in the open. The only thing worse than a bandit acting out of spite is one who feels nothing at all as he plunders your secrets.
Since it’s been possible to keep records, companies have sought to use and benefit from the information they possess. The term “business intelligence” was first coined way back in 1865, in Richard Miller Devens’s book Cyclopaedia of Commercial and Business Anecdotes—a title that feels like it should come with a top hat. Devens studied the ability of merchants and bankers, starting in the 17th century, to benefit from access to information (about war, about competition, about weather, and so on).
Almost a century later, in 1958, the IBM engineer Hans Peter Luhn rekindled the concept for the information age. By then, machines like the ones IBM manufactured had made business intelligence easier, but Luhn identified its intractable challenges: Acquiring and storing data was just the start; it also had to be retrieved and acted upon. Those problems would take another couple decades to resolve.
The most important advance in the process came in 1969, when the computer scientist Edgar F. Codd, also working at IBM, developed a new paradigm for data storage and processing. Codd’s “relational model” was soon realized in software products, known as relational databases, which, starting in 1978, were commercialized by IBM and others. Relational databases made it easy to run “queries” against large, diverse data sets. Sales could be correlated against regions, or suppliers. Prospects could be connected to conversions. The individual actions of particular customers could be aggregated into patterns. And it could all be done quickly, using recently updated information.
Almost every important enterprise-software program of the next decade—most of which ordinary people never thought about or saw—was built atop the idea of a relational database. Oracle has sold a popular one since 1979. It and other companies, including IBM, Microsoft, SAP, PeopleSoft, and Google, customized new enterprise products that used the relational database as a platform. These products are still important today. Enterprise resource-planning software tracks and manages business operations. Customer-relationship management software tracks sales and marketing activities. Supply-chain management systems help manage the flow of components and raw materials for manufacture and distribution. To this day, these systems make ordinary life operate. If you work a job with payroll, get products delivered from Amazon, or own a smartphone assembled from parts, you are a beneficiary of the relational-database industrial complex. And a victim of it, too: Since the 1980s, companies have been using the systems to store and act on information about who you are and what you do.
But for a long time, that information was still scattered all over the place. Your bank or the manufacturer of your vehicle might know how much money you’ve got or what car you drive, but the data was isolated in in separate systems at discrete organizations. A supermarket chain might know how well a specific product line sold in a particular region, but it didn’t know much about who bought it or why.
But then organizations found ways to acquire and recombine information of all kinds. Credit-reporting services offered one way to get information about consumers. The nation’s credit bureaus—Equifax, Experian, and Transunion—became one wellspring, selling access to such information for almost any purpose, including marketing (although legal and operational changes curtailed some of those practices over time). The rise of credit cards, debit cards, and electronic-payment systems made it easier to collect sales information, and to connect multiple purchases to specific customers. Club cards like the kind you use at the supermarket or drug store trade “discounts” on goods for a persistent trail of information tied to an address and a phone number. Guised as loyalty programs, these efforts were only ever meant to collect information.
Data brokers also started collecting and selling specific types of data, like lists of sales prospects for particular categories of goods. Companies could purchase these lists, install them on their local enterprise systems, and then correlate the new, outside data with information they already possessed. All together, these factors shook privacy’s foundation, and they did it long before Google and Facebook.
In 2012, Charles Duhigg published a watershed article, “How Companies Learn Your Secrets,” about how a team of statisticians at Target figured out how to predict customer behavior and capture their business preemptively. “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?” the marketing team had asked—and in 2002, before Google had gone public and before Facebook even existed. The company started correlating any customer interaction—purchases, emails, surveys, coupon use, purchases—to a “Guest ID.” Target also purchased data from brokers, which might include consumer habits, political predilections, financial tendencies, and more, and attached them to those IDs. The result allowed the company to make predictions about future consumer habits, and to market to them accordingly. Target was hardly alone in this practice.
The results felt just as uncanny as today’s supposed Facebook stalkings. More than five years ago, my Atlantic colleague Alexis Madrigal tried to figure out why he started receiving baby catalogs in the mail, even before he and his wife had told anyone they had a child on the way. He traced the catalog to a data broker, which explained that Madrigal’s previous purchases of gifts for nieces and nephews had flagged his household as consumers of children’s apparel, merchandise, and toys. That was why the catalogs arrived; the fact that they were expecting was coincidence. “There was no evil machine that was one step ahead of our own desires,” Madrigal wrote.
That’s true of most of today’s uncanny correlations, the ones people try to pin on a surveillance conspiracy. Someone shouting down the aisle for Red Bull has probably bought Red Bull before. Buying international flight tickets already marks someone as a traveler likely to do so again. Someone who goes to the trouble of making their own dough probably has made other purchases (or visited websites) that make a KitchenAid mixer an obvious match.
The KitchenAid ad or the baby catalog feel different and new because a few things have changed in the data-privacy swamp. For one, data brokerage has expanded ceaselessly over the past few decades. In 2014, ProPublica published an extensive investigation of the various information about individual citizens that companies buy and sell. The results are so far-ranging that they seem almost fictional. Lists of romance-novel readers are for sale. Contributors to international charities. Divorcees. Equifax, the credit bureau, gets pay-stub data from many companies in exchange for employment-verification services. And on and on. If your brain can imagine it, there’s probably a data source for it that someone can buy.
But more importantly, the velocity of acquisition and correlation of information has increased dramatically. Web browsers and smartphones contribute to that, in volume and value. The granularity of location information like the kind Google appears to have been collecting on the sly allows the company to infer connections to specific places its users shop, seek medical treatment, or hang out. It also allows the company to correlate those places with other activities conducted before or after, like a web search before departure or a YouTube video watched on site. Facebook’s entire business model is based on reaping the benefits of that information, and allowing the marketers who use it to cross-reference it with their own. The techlash fallout has forced the company to reassess some of that practice, including facilitating discriminatory ad targeting, but that’s a small detour on a long journey.
The process of correlation has become more sophisticated, too. The venture capitalist Benedict Evans recently made a convincing case that machine learning, a type of computational data analysis currently enjoying a lot of confused hype, has the potential to become as important to the future of human life as the relational database was in the early 1970s. The connections that feel uncanny are actually the outliers, because they are the ones we notice. What about all the rest that go unobserved, linking behaviors in ways individuals haven’t even thought to think? Those are the links that machine learning promises to bind.
The centralization of information has also increased. With billions of users globally, organizations like Facebook and Google have a lot more data to offer—and from which to benefit. Enterprise services have also decentralized, and more data has moved to the Cloud—which often just means into the hands of big tech firms like Microsoft, Google, and Amazon. Externalizing that data creates data-privacy risk. But then again, so does storing it locally, where it is susceptible to breaches like the one Equifax experienced last year.
The real difference between the old and the new ages of data-intelligence-driven consumer marketing, and the invasion of privacy they entail, is that lots of people are finally aware that it is taking place. The Cambridge Analytica scandal, the recent reports about Google, and related events have contributed to that knowledge, but not as much as the barrage of rapidly correlated advertising served in apps and on web pages. The postal mail comes once a day, but people see hundreds or thousands of new renditions of their own private information in the same time on online. It’s easy to mistake the proximate cause—big, shadowy tech firms—for the ultimate one: over half a century of business-intelligence techniques that have been honed, productized, and weaponized out of sight. Google and Facebook are just the tip of an old, hardened iceberg.
That means that easy answers, like limiting the information you give to Facebook and Google, could help, but only somewhat. Certainly it appears that using an iPhone instead of an Android helps hide your physical location better. Regulation or legal action could also reverse some excesses of the data economy. But ultimately it’s a losing battle. Are you really going to stop using Google? Or quit Facebook? Or stop browsing the web? Or leave your smartphone behind? Or disable location services in its hardware settings? Maybe some people will, for now, for a time, but then the reality of contemporary life will corral them back into these services. Eventually, it will become impossible. Unless you are independently wealthy, you can’t opt out of the credit services. Even if you never use your credit card, your employer might be giving your data to the agencies that manage it anyway. You can’t forego the supermarket, or the drug store, or the Target, where every purchase is stored and linked to every other. There is no escaping the machinery of actual life, no matter how many brows get furrowed over or tweets get sent about it.
It’s a shorthand comfort to blame Google for this state of affairs. It puts a face on the scourge, and it targets an enemy that feels worthy. But the opponent in the data-privacy invasion is not a comic-book enemy of fixed form, one that can be cornered, compromised, and defeated. Instead it’s a hazy murk, a chilling, Lovecraftian murmur that can’t be seen, let alone touched, let alone vanquished. Even “the Cloud” isn’t the right metaphor, because pumping out its gaseous poison only draws in a new, cold draft of it from sources unseen. If not websites, pharmaceuticals. If not location data, household goods. If not likes or shares, bank balances and neighborhood demographics. Your data is everywhere, and nowhere, and you cannot escape it, or what it might yet do to you.