I See You: The Databases That Facial-Recognition Apps Need to Survive

No matter how powerful a facial-recognition technology is, it can’t get the job done without a database that links names to faces, such as those owned by Facebook or LinkedIn.

I spy a face. (Nottsexminer/Flickr)

Privacy concerns have been ignited by “NameTag,” a facial-recognition app designed to reveal personal information after analyzing photos taken on mobile devices. Many are concerned that Google Glass will abandon its prohibition on facial recognition apps. And, there are open questions about the proper protocols for opting customers in and out of services that identify people through facial comparisons in real time. These kinds of services are technically “face matching” services, though they are colloquially referred to here as “facial-recognition technologies.”

Ultimately, the coming wave of consumer facial-recognition technologies brings bad and good news. The bad news is obvious: Automatically identifying one of our most unique and personal traits raises serious privacy concerns ranging from stalking to loss of obscurity in public.

The good news is that facial-recognition technology—at least the kind that could be used at scale to identify most people in any given place—has an Achilles heel that buys society enough time to respond appropriately. No matter how powerful a facial-recognition app is designed to be, it can’t get the job done without being connected to a database that links names to faces, such as those owned by Facebook or LinkedIn. Going forward the key is to ensure legal and social pressure demands the same responsible behavior from database owners as it does from designers, hosts, and users of facial-recognition technologies.

In order for any facial-matching technology to work, algorithms must be able to accurately compare new, unknown images to older, identified ones. An app without a database of images to draw from is like a car without gasoline. You can get in, buckle your seat belt, and fantasize about a destination. But you aren’t going anywhere.

If a facial recognition app is going to be used to identify random strangers encountered around the country (if not the world), it must be connected to a database of images of corresponding proportion. While there are still privacy concerns with localized use of facial recognition technologies, a database that only has information about people in limited circles is of limited use.

NameTag seems ominous because it allegedly can draw from “publicly available information.” Indeed, if we imagine the app combining every photo made available by searching Google, it’s easy to assume the worst. But while NameTag users might try to populate a new database with Google results, it’s hard to imagine they could get far enough to make the effort worthwhile.

Putting copyright and contractual issues aside, hitting a critical mass of hundreds of millions, if not billions of photos would rival Wikipedia’s project of crowdsourcing knowledge. While the public saw social good in pulling together to create an online encyclopedia of faces and names, they likely won’t muster the same zeal to build a repository for private companies to profit by minimizing privacy. And if such a project ever gained momentum, it should receive the same public scrutiny as existing substantive name and face databases.

Since it seems unlikely that NameTag—or, frankly, any other company developing facial recognition software—will create a new massive database, the few existing large-scale facial-image repositories will become valued targets. Because it takes too much time and effort to manually collect images from these sources, companies interested in say, Facebook and LinkedIn, government-owned databases such as a DMV's, could try to collect bulk data by scraping publicly accessible profiles.

While automation increases efficiency, scraping is regularly prohibited in the terms of use for photo repository websites. Crucially, companies like Facebook are not shy about asserting their rights to protect this valuable information.

The most plausible method for an app to tap into a truly useful database of names and faces is to partner with a company that already owns a massive one. Almost overnight, owners of juggernaut image repositories like Facebook would become even more powerful by controlling the chokepoint for an entire cottage industry of facial recognition applications. These databases are currently worth a fortune. Yet, they might still be undervalued, given the great difficulty in creating alternatives with vibrant networks of people who constantly update their information.

The Federal Trade Commission (FTC) has an important role to play here. Major owners of name and face databases like Google, Twitter, and Facebook have all entered into consent decrees that require them to work with the FTC on any significant retroactive material changes to their privacy policies. Given the best practice recommendations articulated in the FTC’s report on common uses of facial recognition technology, we can hope that consumers would be safeguarded with design features that only allow facial recognition apps to reveal information about users who have voluntarily opted into a system.

But what about companies that aren’t under a consent decree with the FTC, like online dating websites? In NameTag’s press release, it disclosed that it is currently creating technology to allow profile photos to be scanned from dating sites such as PlentyOfFish.com, OkCupid.com and Match.com. The terms of service agreements required by these websites appears to prohibit the bulk collection of information. If the way around this limitation entails a mutual agreement to obtain the name and face database in bulk, consumer protections will lie in the contractual relationships between database owners and applications.

The FTC relies heavily upon consumer expectations and industry standards when deciding how to use its discretion to regulate unfair and deceptive trade practices. Thus, if society is rightfully concerned about the use of facial recognition technologies, it is incumbent upon individuals and organizations to ensure that owners of massive name and face databases are subject to the same, if not greater, accountability pressures that are placed upon facial recognition software and hardware developers. Owners of such databases can protect individuals by refusing to let applications access their information, by making applications obtain explicit consent to be recognized, by restricting any problematic uses such as stalking and harassment, and by using contracts to keep any app developer accountable to the database owner as well as those who are to be identified.

Regardless of what happens to NameTag, facial recognition apps and the platforms for running them should continue to grow faster than massive consumer image databases. But even if facial recognition software and hardware is inevitable, the creation of massive consumer image databases need not be.