| Expressions are one of the most common forms of nonverbal communication used by humans to convey inner emotions,intentions,and interpersonal communication.Facial expression recognition is an important way to help computers understand emotions like humans and narrow the gap between human-computer interactions.Compared with facial expression datasets based on laboratory-controlled environments,datasets from wild scenes are closer to real scenes,while interference factors unrelated to facial expressions(such as occlusion and head pose variant)in the wild seriously affect the accuracy of facial expression recognition.In addition,the slight changes in facial muscles and highly similar expression categories in the wild also increase the difficulty of facial expression recognition.Although computer vision and artificial intelligence have made significant progress in recent years,accurately recognizing facial expressions in the wild remains a highly challenging task.This thesis is based on deep neural networks and proposes three effective methods for facial expression recognition in the wild,which address the three common challenges that need to be addressed: slight changes in facial movements,interference from occlusion and head pose variant,and high similarity in categories.The specific research content and innovation points are as follows:1)A facial expression recognition method based on regional attention and refinement networks is proposed to address the problem of feature fine-grained features caused by small changes in facial muscles in the wild.This method can capture local fine-grained facial expression representations.We mainly learn discriminative features from three different aspects: multi-head local attention networks,latent feature mining networks,and related aggregation loss.Multi-head local attention networks learn robust local salient features by accurately locating regions of interest.Latent feature mining network learning adaptively mines valuable potential features based on significant local areas to obtain fine-grained semantic information of facial expression images.Ultimately,the edges between different categories are made clearer through relevant aggregation loss.Experimental results on three public in-the-wild expression datasets show that the method can effectively mine fine grained semantic information with subtle changes in the face,and has excellent performance.2)A facial expression recognition method based on fine-grained association graph representation in the wild is proposed to alleviate or eliminate the interference of occlusion and head pose variant in the wild.Inspired by the cognitive pattern of humans from coarse to fine,a hierarchical attention strategy is designed.Firstly,the proposed adaptive salient region induction module integrates spatial and positional information to adaptively highlight the local saliency regions of facial expressions.On this basis,the local fine-grained feature extraction module further captures discriminative fine-grained features of salient regions based on Transformer.Finally,the adaptive graph association reasoning module learns the correlation information between local regions by analogy with facial action units,generating local fine-grained feature correlation maps with strong correlation to enhance the model’s discriminative ability.Extensive experiments are conducted on three public in-the-wild facial expression datasets,all of which achieve optimal results.In addition,an evaluation is conducted on the occlusion and pose variant dataset,demonstrating the excellent performance of the model in the wild with a large amount of occlusion and pose variant interference.3)A facial expression recognition algorithm based on cross-image level facial perception region comparison is proposed to address the issue of highly similar expression categories in the wild.It is the first attempt to use supervised contrastive learning to improve the perceptual contrastive discrimination ability of expression features.The facial perception region representation module extracts rich regional perception facial expression features based on a single image.By using the cross-image perception feature comparison module to compare and learn features,the discriminative ability of features is enhanced.Innovative use of feature memory store to combine local and pixel features has achieved cross-level interactive perception of joint pixel level features,region level features,and image level features.And combine the difficult case sampling strategy with cross-image perceptual feature comparison learning to make similar features closer and separate different features,solving the problem of highly similar categories in facial expression recognition in the wild.Extensive experiments are conducted on three publicly in-the-wild facial expression datasets,and the best recognition accuracy results are achieved compared to the state-of-the-art methods. |