The math of facial recognition

When Kristian Danev woke up in Buenos Aires on a February morning in 2018, he didn't know that on this day justice would catch up with him. When he was arrested at the airport, he had spent nearly a decade on the run, as one of Europe's most hunted criminals, suspected of the stabbing murder of a 24-year-old man in the town of Poděbrady in the Czech Republic in 2008. The reason for the arrest was that Danev's face had appeared in an investigation by the Argentinean police. Unable to identify him, the police forwarded the picture to Interpol, for a search in the Interpol facial recognition register. When the search yielded a hit, it was not only a triumph for the police, but also a feather in the cap for an ancient mathematical theorem.

Facial recognition

We humans can recognise a face in an instant. But for a computer to identify a face, the facial features must be converted into numbers. That process can be broken down into the following steps.Detection

1. Detection

An algorithm searches the image and locates the face.

 
 

2. Landmarks

The algorithm locates specific points on the face, known as landmarks, such as the edge of the chin, the arch of the eyebrows or the middle of the nose.

 
 

3. Measurements

Using the landmarks, the algorithm takes measurements of the face. For example, it can determine the distance between the eyes, the length of the nose or the width of the mouth.

 
 

These measurements, in some algorithms 128 in total, are the computer's numerical description of the face. Each image of a face is thus encoded as a long string of numbers, each number describing a facial measure. The row of numbers is called an eigenvector or descriptor.

To be the same

When Interpol received the picture from the Argentinean police, their task was to compare it with the faces of wanted criminals in their database. The first step was to describe the received picture with a descriptor, one of those long strings of numbers. The next step was to compare the descriptor with the descriptors of the images in the database. That may sound like a fairly straightforward process. If the descriptors match, number by number, they describe the same person, right? But it's not quite that simple. The descriptor describes a single image of a face. But certain factors, such as distance from the camera, facial expressions and lighting, mean that different images of the same person will have partly different descriptors. To get a hit in the search, is therefore a matter of finding descriptors that are sufficiently similar. It turns out that a somewhat unlikely mathematical hero, Pythagoras' theorem, can help with just that.

Distance formula

But wait a moment. Pythagoras' theorem describes the relationship between the distances in a right-angled triangle. How can it help determine whether two pictures describe the same face? Simply put, if two descriptors are reasonably similar, then they are close to each other in our universe of descriptors. And determining distance, well, Pythagoras' theorem is a whiz at that.

Let me explain.

If we place two points in a right-angled coordinate system, we can determine the distance between them using Pythagoras' theorem. We simply consider the distance between the points as the hypotenuse of a right-angled triangle.

 
 

We can calculate the distance d with Pythagoras’ theorem.

 

If you look carefully, you will see that the length of the legs in the triangle is the difference between the x and y coordinates of the two points.

4 - 1 = 3 units

6 - 2 = 4 units

Using this observation, we can formulate the so-called distance formula. It shows how to calculate the distance between two points, if we know the coordinates of the points.

 

The distance formula in two dimensions.

 

Don't be alarmed. The distance formula above is just Pythagoras' theorem in fancy clothing. The formula says that to calculate the distance between two points, you first calculate the difference between the points' respective coordinates. Then you square the differences, add them up and take the root of the result.

In fact, the distance formula also works if you want to calculate the distance between two points that are in space, i.e. in three dimensions. The only difference is that you have a third coordinate to take into account, a z-coordinate that indicates the height of the point above the xy-plane.

 

The distance formula in three dimensions.

 
 

A point in space (3D) has three coordinates.

 

But how can we use this to determine if two descriptors describe the same face? Well, if you stretch your imagination a bit, you can think of each descriptor as a kind of "point" that has 128 coordinates. If we want to determine whether two descriptors are reasonably similar, we can determine whether they are “close” to each other by calculating the distance between them using the distance formula. Well, a version of the distance formula that takes all 128 coordinates into account! If the distance between the descriptors is small (less than some threshold we've determined from the start), then we've got a match! The pictures probably show the same person.

 
 

Pythagoras' theorem, or its cousin the distance formula, is a way of determining whether two descriptors are equal to each other. In modern face recognition systems, there are several other ways. You can read more about them here.

Benefits and fun

Facial recognition algorithms are not uncontroversial. People have been wrongly arrested after a facial recognition algorithm wrongly identified them. Use of the technology in public settings can infringe on our privacy. And it is a fact, that algorithms are better at identifying white men than women and dark-skinned people, which is problematic from a discrimination perspective. But the ideas behind facial recognition can also be useful. They helped the police in Argentina to arrest Kristian Danev. They are used to locate tumours in X-ray images and they are used daily on our mobile phones and in social media. Mobile companies like Apple, for example, use facial recognition as a login method. Facebook uses the technology to locate faces in photos, so we can tag our friends. And video calling apps locate the contours of your face so you can use fun effects like these.

 
 

Next time you use one of these services, you can send a thought to all dedicated programmers, but also a grateful thought to Pythagoras.

References and futher reading

Alpman, Marie (2020) Ansiktsigenkänning - vem ser dig? Forskning och Framsteg

CTK (2018) Dopadení nejhledanějšího zločince z Česka: V Argentině ho odhalil počítačový program, Blesk.cz

Geitgey, Adam (2016) Modern Face Recognition with Deep Learning, Medium 

Hao Wu et.al. (2021) Face Recognition Based on Haar  Like and Euclidean  Distance Journal of Physics: Conference Series IOP Publishing doi:10.1088/1742-6596/1813/1/012036

Hill, Kashmir (2020) Another Arrest, and Jail Time, Due to a Bad Facial Recognition Match, New York Times,

Interpol (2018) INTERPOL facial recognition nets most wanted murder fugitive

NEC (2020) A brief history of facial recognition

PXL Vision (2021) Machine learning and how it applies to facial recognition technology, Blog 

Sunil  Swamilingappa  Harakannanavar et.al. (2018) Performance Evaluation of Face Recognition based on Multiple Feature Descriptors using Euclidean Distance Classifier, Int.  J. Advanced Networking and  Applications, 3864 Volume: 10  Issue: 03 Pages: 3864-3879  )  ISSN:  0975-0290

Thomas (2019) Building a face recogniser : traditional methods vs deep learning.

Demos:

This demo allows you to find landmarks in an uploaded photo: https://reconess.com/products/landmarks

This demo allows you to find both landmarks and a grid in the live feed from your webcam: https://storage.googleapis.com/tfjs-models/demos/facemesh/index.html

This demo lets you find grids in an uploaded photo: https://codepen.io/mediapipe/details/KKgVaPJ

Föregående
Föregående

Math in the hand of the designer

Nästa
Nästa

The Explore/Exploit-dilemma