Font Size: a A A

Deep Feature Learning And Disentanglement Of Face Images

Posted on:2024-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D LiFull Text:PDF
GTID:1528307079989029Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The face contains rich personal information including race,skin color,gender,age,identity,and expression,and is the main channel for identity recognition and emotion expression.It plays an important role in social interpersonal communication and human-computer interaction,with great research value and wide application prospects.Deep feature learning is currently the dominant approach for face feature learning,which has made great progress in face recognition,facial expression recognition,facial age estimation,forged face recognition,and facial attribute editing,etc.In recent years,the ever-increasing demands for solving complex challenges in practical applications has triggered a boom in the realm of facial deep feature learning.This paper proposes a series of deep feature learning and disentanglement methods for different face challenges,which are described as follows.In Chapter 3,a cropping and attention-based approach for masked face recognition is proposed.The global epidemic of COVID-19 makes people realize that wearing a mask is one of the most effective ways to protect ourselves from virus infections,which poses serious challenges for the existing face recognition system.There are two main challenges in current masked face recognition: 1)it is hard for face detection system to accurately detect masked face images,2)most of the effective features of the face are severely corrupted.To tackle the difficulties,a new method for masked face recognition is proposed by integrating a cropping-based approach with the Convolutional Block Attention Module(CBAM).The optimal cropping is explored for each case,while the CBAM module is adopted to focus on the regions around eyes.Comprehensive experiments on several benchmark datasets show that the proposed approach can significantly improve the performance of masked face recognition compared with other state-of-the-art approaches.In Chapter 4,a multi-modal Transformer for facial expression recognition in the wild is proposed.Facial expression in the wild is more challenging due to the issues of unconstrained variations(occlusion,pose,illumination,etc.)and annotation ambiguity due to the subjectiveness of annotators,ambiguous facial expressions,or low-quality facial images.To address this problem,a novel multifarious supervision-steering Transformer for FER in the wild is proposed in this paper,referred as FER-former.In specific,to dig deep into the merits of the combination of features provided by prevailing CNNs and Transformers,a hybrid stem is designed to cascade two types of learning paradigms simultaneously.Wherein,a FER-specific transformer encoder is devised to characterize conventional hard one-hot label-focusing and CLIP-based text-oriented tokens in parallel for final classification.Then,the extracted features are downsampled to obtain diverse spatial cues that enable the model to overcome the issues of occlusion and pose variances.More importantly,FER-former makes image features also have text-space semantic correlations by supervising the similarity between image features and text features.Extensive experiments on popular benchmarks demonstrate the superiority of the proposed FER-former over the existing state-of-the-arts.In Chapter 5,a dual-channel feature disentanglement approach for identity-invariant facial expression recognition is proposed.Facial expression recognition is a challenging task owing to subtle inter-class differences and significant intra-class variations.To address this problem,we propose a novel dual-channel alternation training strategy,in which image pairs with different expressions from the same identity and image pairs with the same expression from different identities are alternately fed into a Siamese network for model training.Unlike previous studies,the extracted features from each branch of the Siamese network are disentangled into three feature subspaces,namely,an expression-related feature subspace,identity-related feature subspace,and shared feature subspace,to reduce the potential negative effects caused by expression-related features contaminated by identity components.To further enhance the ability to pull the same expressions together and push different expressions apart in the feature space,the Hilbert–Schmidt independence criterion(HSIC)is introduced to design an identity-sensitive and expression-sensitive loss function because of its excellent ability to measure the similarity between high-dimensional vectors.Comprehensive experiments on benchmark datasets demonstrate that the proposed approach can produce competitive recognition results compared with state-of-the-art methods.In summary,this paper presents an in-depth study of feature corruption,annotation ambiguity,and feature entanglement in masked face recognition and facial expression recognition.Targeted deep feature learning is achieved through flexible design of network structure and loss function.Extensive experiments show that the methods proposed in this paper are effective at improving the deep feature learning in various application situations.
Keywords/Search Tags:deep learning, masked face recognition, facial expression recognition, attention mechanism, feature disentanglement, multi-modal learning
PDF Full Text Request
Related items