Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person

Fania Mokhayeri, Eric Granger, Guillaume-Alexandre Bilodeau

 




Outline of the proposed research

The performance of face recognition systems can decline significantly because faces captured in unconstrained operational domain over multiple video cameras have a different underlying data distribution compared to faces captured under controlled conditions in the enrollment domain with a still camera. This is particularly true when individuals are enrolled to the system using a single reference still. To improve the robustness of these systems, it is possible to augment the reference gallery set by generating synthetic faces based on the original still. However, without the knowledge of the operational domain, many synthetic images must be generated to account for all possible capture conditions. This research aims to augment the reference set through synthetic set generated based on the original reference still, and by taking into account the intra-class variation information transferred from a generic set in operational domain. In this way, a new approach is proposed that exploits the discriminant information of the generic set for the face synthesis process. The new algorithm called domain-specific face synthesis (DSFS) maps representative variation information from the generic set in operational domain to the original reference stills in enrollment domain. The DSFS technique involves two main steps: (1) characterizing capture condition information from the operational domain, (2) generating synthetic face images based on the information obtained in the first step. Prior to operation (during camera calibration process), a generic set is collected from video captured in the operational domain. A compact and representative subset of face images is selected by clustering this generic set in a capture condition space defined by pose, illumination, blur. The 3D model of each reference still image is reconstructed via a 3D morphable model and rendered based on pose representatives. Finally, the illumination-dependent layers of the lighting representatives are extracted and projected on the rendered reference images with the same pose. In this manner, domain-specific variations are effectively transferred onto the reference still images. The main advantage of the proposed approach is the ability to provide a compact set that can accurately represent the original reference face with relevant of intra-class variations in pose, illumination, motion blur, etc., corresponding to capture condition in the operational domain. In a particular implementation for still-to-video face recognition, the original and synthetic face images are employed to design a structural dictionary with powerful variation representation ability for SRC. The main steps of the proposed domain-invariant still-to-video face recognition with dictionary augmentation are summarized as follows: 

  • Step 1: Generation of Synthetic Facial ROIs
  • Step 2: Augmentation of Dictionary
  • Step 3: Classification
  • Step 4: Validation

Experimental results obtained with videos from the Chokepoint and COX-S2V datasets reveal that augmenting the reference gallery set of still-to-video FR systems using the proposed DSFS approach can provide a significantly higher level of accuracy compared with the state-of-the-art approaches, with only a moderate increase in its computational complexity.

Publications

Mokhayeri, F., Granger, E. and Bilodeau, G.A., Domain-specific face synthesis for video face recognition from a single sample per person.  IEEE Transactions on Information Forensics and Security, 14(3), pp.757-772, 2019.

Code and trained models

We are sharing the codes so that other researchers can use the proposed face synthesizing technique for face recognition. The code and dataset can be found in: The code for using the trained models can be found in: https://github.com/faniamokhayeri/DSFS.


References

I. Masi, A. T. Trân, T. Hassner, J. T. Leksut, and G. Medioni, Do we really need to collect millions of faces for effective face recognition. in Proc. ECCV, pp. 579–596, 2016.

A. T. Trân, T. Hassner, I. Masi, and G. Medioni, Regressing robust and discriminative 3D morphable models with a very deep neural network. in Proc. CVPR, 2017, pp. 5163–5172.

Z.-M. Li, Z.-H. Huang, and K. Shang, A customized sparse representation model with mixed norm for undersampled face recognition. IEEE Trans. Inf. Forensics Security, vol. 11, no. 10, pp. 2203–2214, Oct. 2016.