The collection of personal data is now ubiquitous, and people are starting to pay attention. But data-collection policies have been built primarily on what we technically can do, rather than what we should do.

The gulf between can and should has led to controversies about the sharing of student data and debate about a massive emotional contagion experiment conducted on the News Feeds of close to 700,000 users on Facebook. Researchers of all stripes are scrambling to find a clear way forward in uncharted ethical territory.

Underlying the discussion has been a tangle of big, thorny questions: What policies should govern the use of online data collection, use, and manipulation by companies? Do massive online platforms like Google and Facebook, who now hold unprecedented quantities of sensitive behavioral data about people and groups, have the right to research and experiment on their users? And, if so, how and to what extent should they be permitted to do so?

While much has been written about these questions, there's also the the deeper issue of how people have been attempting to answer them. That meta-issue is fundamental to the future of data collection and its use in the years ahead.  

The debate around the emotional contagion experiment, for instance, is fundamentally a debate about what metaphor should guide our thinking about what the Facebook News Feed actually is. As Jeff Hancock, a co-author of the paper based on the experiment, has recognized, “[T]here’s no stable metaphor that people hold for what the News Feed is.” Proving the point, commentators have deployed a range of conflicting metaphors to argue about whether the experiment crossed the line: the experimental manipulation has been compared to a field study, to an A/B test, to books and television programs, and even to a dime left suggestively in a public phone booth. The controlling metaphor defines the moral burden of the project: If the contagion experiment is like any other routine A/B test, then there is no foul. If the contagion experiment is more, say, like a field study, it implies a greater ethical onus on the researchers’ conduct.

Contemporary ideas about data are tied up inextricably with metaphors around data. As a concept, data constantly eludes crisp definition. It is everywhere and nowhere, encompassing a mind-boggling array of people, activities, and concepts. One dictionary, taking up the challenge of definition, unhelpfully offers that data is “facts and statistics collected together for reference or analysis.” But this problem is not unique to data; humans are forced all the time to deal with broad concepts they cannot fully articulate. So people do here what they do in all cases—lean on the crutch of metaphor. Rather than talk about data directly, we analogize to better understand situations that seem to line up with the problem at hand.

This is still just a partial solution. Data escapes attempts to fit it neatly into a single conceptual box. Consider three phrases—now so commonplace as to be unremarkable—that we use to talk about data:

  • Data Stream,” which refers to the delivery of many chunks of data over time;
  • Data Mining,” which refers to what we do to get insightful information from data; and
  • The Cloud,” which refers to a place where we store data.

These tropes are notable because they use distinct, physical metaphors to try to make sense of data within a specific context. What’s more, all three impute radically different physical properties to data. Depending on the situation, data is either like a liquid (data streams), a solid (data mining), or a gas (the cloud). Why and how these metaphors get used when they do is not immediately obvious. There are tons of alternatives: Data could be stored in a “data mountain,” or data could be made useful through a process of “data desalination.”

The metaphors we use matter, because metaphors have baggage. Metaphors are encumbered with assumptions, and when people use metaphors, they embed those assumptions in the discussion. These assumptions are the residue of the physical analogues from which the metaphors draw. Referring to “data exhaust”—a term sometimes used to describe the metadata that are created in the course of day-to-day online lives—reinforces the idea that these data, like car exhaust, are unwanted byproducts, discarded waste material that society would benefit from putting to use. On the other hand, calling data “the new oil,” carries strong economic and social connotations: Data are costly to acquire and produced primarily for commercial or industrial ends, but bear the possibility of big payoffs for those with the means to extract it.

What’s more, metaphors matter because they shape laws and policies about data collection and use. As technology advances, law evolves (slowly, and somewhat clumsily) to accommodate new technologies and social norms around them. The most typical way this happens is that judges and regulators think about whether a new, unregulated technology is sufficiently like an existing thing that we already have rules about—and this is where metaphors and comparisons come in.

Is the Internet like a utility, for purposes of protecting net neutrality? Is a website like a hotel or restaurant, so that it needs to be accessible to people with disabilities? In terms of constitutional protections from warrantless search, is your car like your home, or is your cell phone like your body? These comparisons come up all the time, and have huge implications for how we collect, regulate, value, and compensate for data.

What’s notable about dominant data metaphors is that they consistently compare data to naturally occurring physical resources. And just as the history of resource exploitation in America—from westward expansion through the Gold Rush, and beyond into modern-day debates about water and air rights—involves the appropriation of resources that belonged to someone else, online data collection policy treats personal information as a natural, inexhaustible good—ripe for exploitation in the name of economic growth and private gain.

And in all our talk about streams and exhaust and mines and clouds, one thing is striking: People are nowhere to be found. These metaphors overwhelmingly draw from the natural world and the processes we use to draw resources from it; because of this, they naturalize and depersonalize data and its collection. Our current data metaphors do us a disservice by masking the human behaviors, relationships, and communications that make up all that data we’re streaming and mining. They make it easy to get lost in the quantity of the data without remembering how personal so much of it is. And if people forget that, it’s easy to understand how large-scale ethical breaches happen; the metaphors help us to lose track of what we’re really talking about.

Replacing ingrained metaphors with entirely new ones is never easy, but those concerned about the ethically combustible mix of platforms, data, and social science might rewire the discussion by taking dominant frameworks precisely at their word.

This post is based on research from the Intel Science and Technology Center for Social Computing