This story began with a simple question: if a facial recognition system processes a lot of pictures of a child, will it recognize that person when he or she grows up?
If I were to upload all my childhood photos to Facebook (or some future Facebook), could a biometric identification system link the button-nosed, round-cheeked child with a bowl cut to my adult face, which has lost its button, cheeks, and hair?
It's not an idle question: parents are posting millions of photos of their children to social networking sites, as are kids themselves when they are old enough to use Facebook and the like. Will these photos permanently identify them as they grow older, linking their childhood or teenage antics to their adult identities?
Or does the natural aging process provide some level of protection from the prying computations of facial recognition algorithms? If I can barely identify myself in photos from my childhood, what hope does a computer have?
There is no simple answer, though there is a good theoretical lower age-limit: it would be very difficult for a facial recognition system to match up a photograph of a child under the age of seven with a photograph of that same person as an adult.
And, in practice, most facial recognition systems aren't close to being able to do this kind of identification in the field. Still, that might not allay the worries of technology thinkers like Amy Webb, who recently warned parents to post no photographs of their children online because "ubiquitous bio-identification is only just getting started."
What's at stake is this: how firmly do we want the media that children produce to attach to their adult identities? For most current adults, the pictures and videos we made as kids are not searchable or accessible, except for the hand-curated selections of "throwback Thursday."
Kids now are growing up on the Internet are trailed by an ever larger and deeper digital footprint. The danger is that it might restrict their freedom to develop as future people. Algorithms rely on what they know about someone's childhood to channel their possibilities as an adult.
If pictures (or YouTube videos) from your youth can be connected to your adult identity, it would, at the very least, increase the ethical complexity of posting or hosting images of children.
Let's get into the details.
This kind of facial recognition work emerges from very different places: forensic scientists, pure computer scientists, and facial recognition practitioners. Forensic scientists are trying to solve a very practical problem: if a child goes missing for some long period of time, how can law enforcement create a more up-to-date portrait of the child? They want a system that can artificially age a missing child's face. We all know kids change, but that's not the kind of rigorous analysis one needs to Photoshop five years onto a child's visage. Artificial aging is almost the reverse of what a facial recognition system would do.
The fastest changes are between infancy to 3, and then during adolescence (after 10 years old) into adulthood, said Alex Cybulski, a doctoral candidate at the University of Toronto Information School, where he's studying surveillance. "You can understand how this complicates things as the changes to the craniofacial shape and texture of a face during the early period of an individual's life are rapid and thus elusive to estimation by computer modeling for the purposes of facial recognition."
Elusive, perhaps, but not impossible. Cybulski pointed to the work of forensic researcher Stuart Gibson at the University of Kent, who "has proposed that because of the way in which the face changes during [childhood] starting at age seven is considered the maximum range through which changes can be estimated and therefore, reliably compared."
What Gibson has done is try to take images of children at various ages and build computer models that attempt to artificially age them. So, for example, here, the photos on the far left (A) and right (F) are actual pictures of the subjects. Columns B through E show different algorithmic projections based on his models.
One can imagine that these attempts to quantitatively model the changes in bone structure, skin texture, and other aesthetic variables might lead to a better facial recognition system.
Another place this quixotic question led me was to mathematicians like Nigel Boston at the University of Wisconsin, Madison. He referred me to the work of UCLA's Stephen Soatto.
For Soatto, a face is shape-space with certain properties. "Your identity is what is left invariant by some class of transformations," is how he put it to me. For him, the problem is that if we want to match photos or discriminate between individuals, there are two kinds of variability. One is intrinsic—my face now versus my face 25 years ago—but the other is "nuisance variability," or features of images that are irrelevant to my identity.
Soatto wrote a paper on getting rid of this kind of variability with respect to focal length in images, which heavily distort people's faces (especially the front-facing one on your phone: "you cannot see your ears, your nose looks bigger"). And he saw a parallel between the mathematics underlying that research and quantifying the effects of aging. In our specific question, the nuisance variability we are trying to eliminate is time, and "the way in which time affects your data is very complex, but mathematically it is a 1-parameter morphism that deforms your face," Soatto said.
He believes that using the same methodology as in the focal length study could be applied to aging and facial recognition. They could feed lots of images into the model and "learn away the variability," he said. "Conceptually, it is exactly the same thing. The only difficulty is getting the data for this. You need consent and it would be a long, longitudinal study."
The challenge, of course, is that we all age differently, but "there is geometric consistency because faces are not arbitrary objects." One can expect crow's feet and a wobblier jaw, for example. Or for children, one can expect the size of the forehead to even out relative to the rest of the face.
Perhaps some set of the dozens of possible landmarks on a face hold constant, or the relationships between them do.
How good could a computer system get at this kind of computation?
"Suppose we take your two photos separated by x years. To keep things simple, assume they are portraits (frontal pose, neutral expression and uniform illumination). Generally, if x < 10 years, high matching accuracy will be maintained by state of the art face recognition systems. But because different persons age differently, this value of x may vary from person to person," a computer vision practitioner Anil Jain, a professor at Michigan State, told me in an email.
Children, of course, would be more difficult. And, Jain noted, that face matching in "unconstrained settings" like surveillance video or random snapshots "poses challenges."
But most of the work that's been done by academics draws on dozens to thousands of photographs. What happens when one attains Facebook scale, billions of photos, with thousands (or even tens of thousands) of images of an individual through time?
"We will likely see much more work on this once social networks have built up libraries of digital images of people at different ages," said Yana Welinder, an affiliate scholar at Stanford's Center for the Internet and Society, who has studied facial recognition. You can bet that Facebook is going to try to identify its users no matter how old they are, or in whose pictures they appear. And they'll probably get good, too, as the "unreasonable effectiveness of data" makes their algorithms better.
Will computers ever get better than humans?
Soatto, for one, doubts it. Facial recognition, after all, is a remarkably difficult task that (almost) all humans are exceptional at. "The reality is that it is so complex and humans are so attuned to very subtle cues on the human face that it would be very difficult for an engineered system to mimic or exceed the extrapolated abilities of humans," he said.
The problem is that it goes to "the core of what knowledge and learning is," Soatto said. "There is tons of data and the data is not information. Information is what is left in the data after you throw away what doesn't matter to your task. You've picked one source of nuisance variability, which is age, and that's a tough one. But that same conundrum permeates every other branch of knowledge and learning."
The fundamental nature of the problem might suggest that until the machines can learn as well as we do, it will be difficult for them to overtake us in facial recognition tasks.