Gait recognition based on neural networks

Contact person: Anton S. Konushin (ktosh@graphics.cs.msu.ru)
The project aims at the neural network solution of gait recognition problem. Having the database of people and the sequence of frames where person's full-height walk is recorded, it is necessary to identify the person in video. In order to recognize the motion rather than the appearance, the maps of optical flow (OF) between consecutive frames are used as the main source of information. The method based on the OF around the full body achieves quite a high classification accuracy, nevertheless, the use of the information about human pose can improve the quality of the algorithm. Finding the key points of the body and consideration of the OF in the neighborhood of these key positions allow to outperform the state-of-the-art results.

The proposed recognition algorithm consists of 3 steps:
  1. Frame-by-frame pose estimation and the calculation of optical flow maps around the key points of the figure.
  2. The training of deep convolutional neural network to predict the subject with the patch cropped from its OF map.
  3. Using the trained network as gait feature extractor. The classification of people according to obtained descriptors.
The method was evaluated on popular gait datasets: TUM-GAID Dataset and multi-view database OU-ISIR. The results of the experiments are shown below.

Results on TUM-GAID dataset:
Architecture, Classifier and Metrics Rank-1 Rank-5
VGG + blocks, kNN, L1 97,52 99,89
VGG + pose, kNN, L1 98,81 100,00
Wide ResNet + pose, kNN, L2 98,81 99,78
Wide ResNet + pose, kNN, L1 99,78 99,89
Results on TUM-GAID dataset. The gallery view is fixed at 85 degrees and the probe views are 55, 65, and 75 degrees.
Architecture, Classifier and Metrics 55° 65° 75°
Wide ResNet, kNN, L1 92,8 96,2 97,8
In addition, the transferability of the algorithms between the databases and the influence of the key points set and the length of the video on recognition quality is investigated.

Team

  • Anna Sokolova

Publications

Acknowledgements

This work was supported by grant RFBR #16-29-09612 "Research and development of person identification methods based on gait, gestures and body build in video surveillance data".