How can our brain discriminate and identify a particular face among a virtually infinite number of extremely similar faces? Doris Tsao has solved this enigma.
In one of his most famous clinical tales, The Man Who Mistook His Wife for a Hat, the american neurologist Oliver Sacks wrote about a patient who was unable to recognize particular faces, a syndrome called prosopagnosia.
Such a brain impairment is almost unfathomable. We are usually very good at recognizing faces – because we absolutely need to be able to tell faces apart in order to succeed in our most basic social interactions.
Doris Tsao, who works at Caltech, discovered how neurons in our brain encode faces. With her team, she effectively “cracked the neural code” – the Rosetta Stone, as she calls it – for faces. Recently, this Chinese-born, Maryland-raised woman of 42 – who has been pursuing this goal since her graduate student days – gave a fascinating talk at the Champalimaud Centre for the Unknown about her work. We asked her a few questions about it.
You “cracked the neural code” for faces. What does that mean exactly?
It means that we can use the activity of neurons to reconstruct faces. Specifically, we show a monkey an arbitrary face while we record neural activity in certain areas of the visual part of its brain. Then, based on the responses of those neurons, we can reconstruct [with a computer] the face that the monkey saw.
Why did you focus on faces in particular?
Because we already know a lot about the pathway for face processing in the brain. We have no idea which cells are encoding a sweater or a table, for example. But we know which parts of the brain are responsible for coding faces. And we know this because we can map these areas using fMRI.
What are these areas?
We call these areas “face patches”, and they were actually first discovered in humans. There are six of them in each hemisphere, in the inferior temporal cortex.
How does face recognition work? What do these neurons actually compute to reconstruct a face?
You can make an analogy to the RGB system for colors, where any color can be encoded as coordinates along three axes: red, green and blue. The retina works exactly like this: there are three types of cells that are dedicated to color coding: “red” cells, “blue” cells and “green” cells.
What these cells do is just to transform, to “project”, an incoming color along different color “axes” [in a virtual 3D space]
If such a cell responds only to red, it will transform an incoming patch of color into a coordinate only along the “red” axis. But to encode the magentas, for instance, which are a combination of red and blue, cells that respond to red and blue will both have to project values onto the “red” axis and the “blue” axis.
What we discovered was that the face cells are doing the same thing, but instead of projecting inputs onto axes in a virtual “color space”, they’re projecting them onto axes in what we call “face space”. You can think of each axis as one facial feature.
So, just like the color cells, the face neurons perform an extremely simple operation: they transform an incoming face into values along different shape “axes” and appearance “axes” in face space.
So what you’re saying is you have a certain number of shape and appearance axes.
Yes, there are two types of axes. Here, it gets slightly technical; to figure out what these axes were was really one of the big steps that helped us crack the face code.
The easy way to think about it is to say that the various shape axes tell us about the skeletal shape of the face – inter-eyes distance, face width, hairline width, and so on. And the different appearance axes tell us about the texture map of the face – that is, the pixel values of the face.
Do you think the brain actually uses such a code?
We know it’s using this code. We think that, physically, the brain’s mechanism for coding faces is exactly like the RGB code for colors. There’s just a higher number of dimensions for faces.
How many axes are there in “face space”?
We use 50 axes, 25 of them are shape axes and 25 are appearance axes. We could have chosen 51 or 49; we can use various computational techniques to ask how many axes the brain is actually representing.
Tell us more about the experiment you did on the macaque, which allowed you to reconstruct a specific face from neural activity.
We generated 2,000 random pictures of faces which presented variations on the “features” measured by the 50 axes. We then recorded the activity of 205 randomly selected face cells [as the monkey was seeing each face] to obtain the response of every cell to the same face.
Finally, we were able to use the activity of those 205 face cells to reconstruct each of the 2,000 faces. Using the responses of the neurons to all but one of the faces, we developed an algorithm that was able to predict, with great accuracy, what the remaining face – which it had never “seen” before – looked like.
Just 205 neurons were needed to reconstruct every one of the 2,000 different faces?
Yes. The fact is that this code provides an extremely compact way to represent facial identity. It only needs a small number of cells (around 200).
So that’s how we humans are able to distinguish a virtually infinite number of faces?
Yes. And again, it’s really the same principle which enables you to encode an infinite number of colors using just three types of cells.
Once the brain has encoded a face in face space, where does that information go?
We don’t know the details. We’ve now started recording activity in a region which is downstream of the face patches, where we think that contextual cues, such as familiarity and spatial context, start to play a role.
But it can basically go anywhere. For example, I can tell you to clap your hands if you see person A and frown if you see person B. Right now, within the ten seconds it took me say that sentence, you have wired up your representation of person A to your “clapping neurons” and your representation of person B to your “frowning neurons”.
Could this work have implications for machine face recognition?
I don’t know if it’ll impact performance, but I think it will tell us how to build an artificial network that more closely mimics the brain, obviously. And I think that in that dialog, we will also understand better why the brain is constructed this way. Because we found this code, but we don’t really know why the brain is using this one and not some other one. And once we discover why the brain is using this code, I think that’s going to tell us something profound.
Ana Gerschenfeld works as a Science Writer at the Science Communication Office at the Champalimaud Neuroscience Programme
Edited by: Liad Hollender (Science Communication Office). Photos: Tor Stensola and Doris Tsao.