How an Attempt at Correcting Bias in Tech Goes Wrong

Google allegedly scanned volunteers with dark skin tones in order to perfect the Pixel phone’s face-unlock technology.

A homeless encampment in San Francisco.
Jeff Chiu/ AP Images

As Silicon Valley pushes facial recognition as a convenient means to secure your laptop, board a flight, or pay for dinner, it has run into a problem: Computer vision systems have repeatedly misidentified dark-skinned black people as criminals, labeled them as gorillas, or simply failed to see them altogether.

These horrifying incidents are the unintentional results of harder-to-spot bias in the manufacturing process. When a data set used to train AI to “see” doesn’t include enough people with dark skin (an underrepresentation bias), the resulting technology works differently on lighter skin than it does on darker skin (an accuracy bias). Garbage in, garbage out; racism in, racism out.

The natural solution, it would seem, is to train AI on diverse data sets. But this imperative creates its own problems. Last week, the New York Daily News reported that Google had sent contractors to Atlanta, Los Angeles, and college campuses across the country to collect biometric data that it could use to train the facial-recognition software in its Pixel phones. According to the Daily News, the contractors offered subjects $5 Starbucks gift cards in exchange for 3-D scans of their faces, taken with the Pixel. Google allegedly gave the contractors daily quotas, ordered them to prioritize subjects with dark skin, and encouraged them to approach homeless people, who it expected to be most responsive to the gift cards and least likely to object or ask questions about the terms of data collection.

Managers reportedly encouraged contractors to mischaracterize the data collection as a “selfie game,” akin to Snapchat filters such as Face Swap. College students who agreed to the scans later told the Daily News that they didn’t recall ever hearing the name Google and were simply told to play with the phone in exchange for a gift card. To entice homeless users in L.A. to consent, contractors were allegedly instructed to mention a California law that allows the gift cards to be exchanged for cash. The whole episode is, in a bleak way, an apparent attempt to diversify AI training data while paying people for their information. But the result is completely dystopian.

According to The New York Times, Google temporarily suspended the data collection, pending an internal investigation. In an emailed statement to The Atlantic, a Google spokeswoman said, “We’re taking these claims seriously and investigating them. The allegations regarding truthfulness and consent are in violation of our requirements for volunteer research studies and the training that we provided.”

It’s baffling that this purported scheme, which the Daily News’s reporting suggests commodified black and homeless Americans, was intended to reduce racial bias. But as the Harvard technologist Shoshana Zuboff has argued, people have always been the “raw materials” for Big Tech. Products such as the Pixel and the iPhone, and services such as Google and Facebook, collect our data as we use them; companies refine that data, and, with each new generation, sell us more advanced products that collect more useful data. In this framework, our habits, our choices, our likes, and our dislikes are not unlike soybeans or petroleum or iron ore—natural resources that are extracted and processed by huge firms, for massive profit.

Sometimes this looks like a smart thermostat getting better at predicting how cool you like your home, and sometimes it looks like a $1 trillion company allegedly offering $5 gift cards to homeless black people to better sell a $1,200 phone.

As the techlash continues, some lawmakers are seeking to empower their constituents to demand that companies such as Google pay users for their data. California and Alaska have debated legislation to charge companies for using people’s personal data. Andrew Yang, the 2020 Democratic presidential candidate, has advocated treating data as a “property right.” The Facebook co-founder Chris Hughes suggests a “data dividend,” a revenue tax on companies monetizing enormous amounts of public data, paid out to users across the country, like universal basic income.

But following that line of thinking makes it clear that we still have no ethical or economic framework for valuing data collected from people across different social contexts. Should tech companies pay more for dark-skinned subjects because they’re underrepresented in training data? If our bodies are commodities, what’s a fair price, and who should set it? The data-ownership idea is, fundamentally, limited: Even if we manage, with the help of Hughes or Yang or state legislatures, to negotiate a high price for our data, we’re still for sale.

In a backwards way, movements to pay users for the data that tech companies take from them only corroborate the process by which Silicon Valley turns our faces into commodities. Imagine an unregulated race-to-the-bottom market where companies target the most vulnerable for their data, restrained only by the alarmingly low bar for consent to improve their products. It would look a lot like paying homeless people $5 for a face scan.