Scientists teach the neural network to carry out video facial recognition — using a single photo

Researchers at the Higher School of Economics have proposed a new method of recognizing people on video with the help of a deep neural network. The approach does not require a large number of photographs and it has a significantly higher recognition accuracy compared to already existing methods – even if only one photo of a person is available. The results of the work have been published in the articles 'Fuzzy Analysis and Deep Convolution Neural Networks in Still-to-Video Recognition' and 'Unconstrained Face Identification Using Maximum Likelihood of Distances Between Deep Off- the-shelf Features,'

Facial recognition technologies have been developing rapidly over the past few years. These technologies, for the verification and identification of individuals, are used in a variety of areas, from law enforcement agencies in the fight against terrorism to social networks and mobile applications. Research groups at international corporations and leading world universities are continuously experimenting with data and the instruments themselves in order to increase recognition accuracy.

Recognition can take place in many ways, but the best results have recently been achieved with the help of high-precision neural networks. The more training images are presented to the neural network, the better this process will work. The network extracts key facial features and then uses this knowledge when recognizing unknown images.

Now, there is easy access to more and more datasets of photos and these are used to train the neural network. For constraint environment of observation (photos with the same face orientation, illumination, etc.), the accuracy of the algorithms has long since reached the human level of ability to recognize faces. However, achieving high accuracy when recognizing video data that is collected under unconstrained conditions with variable illumination, rotation and size, is much more of a challenge for researchers.

'The network can recognize a well-known actor with 100% accuracy, becausethe number of available images of the actor is estimated to be in the millions). However, this does not mean that, with the transfer of knowledge accumulated in a neural network, it can adapt and recognize a person of whom only single photo is available as a training sample,' explains Professor Savchenko from the Department of Information Systems and Technologies, HSE University, Nizhny Novgorod.

In order to solve this problem, researchers from the Higher School of Economics (HSE) used the theory of fuzzy sets and probability theory to develop a video recognition algorithm. This algorithm significantly improves the accuracy (by 2-6% compared to earlier experiments) of identifying faces by video in real time with a small number of images for several well-known neural network architectures, such as VGGFace, VGGFace2, ResFace and LightCNN.

As a test database, researchers from the HSE used traditional datasets to evaluate facial recognition methods on video: IJB-A (IARPA Janus Benchmark A) and YTF (YouTube Faces). These sets include freely available images of famous people (actors, politicians, public figures), gathered from open sources in unconstrained environment at different times. In the most complex experiment, the developed algorithm was used to recognize people on video from YouTube using several photographs of the same people from another LFW (Labeled Faces in the Wild) dataset, which used a higher resolution. The photos themselves were taken in different places at various times (from the 1970s to the 2010s).

The essence of this new approach is to use information on how the reference photos are related, namely how close or far apart they are. The connection (that is, the distance in the mathematical model) between similar individuals is smaller, and between dissimilar individuals – greater. Knowing to what degree people differ from each other enables the system to correct errors in the process of recognizing video frames.

'The algorithm estimates to what degree one frame is closer to one person, and to what degree the other frame is closer to the next person. Then it compares how similar the training still photos of these two people are to each other. It then adds the third person to the mix and evaluates to whom they are more similar – the first or second. It then corrects recognition errors,' explains Professor Savchenko.

This algorithm is already implemented in Python for stationary computers, which enables the user to find and group the faces of different people in photo/video albums and to estimate a person's year of birth, gender and other parameters. An Android application prototype has also been developed which determines the age and gender of people in photos and videos. Analysis of the photo gallery enables the automatic assessment of the degree of social activity of the user and identifies the user's close friends and relatives. For modern smartphones, the application prototype processes 15 frames per second. According to researchers, thanks to their approach, facial recognition can be carried out with higher accuracy.


Media Contact

Liudmila Mezentseva
[email protected]
@HSE_eng <h4>Related Journal Article</h4>