| 55% of the information in human daily communication comes from expressions,which contain a lot of human emotional information.Through the observation and recognition of human natural facial expressions,machines can better understand human emotions and communication methods.With the continuous development of humancomputer interaction,expression recognition has extremely important research value and significance for the fields of computer vision(CV)and artificial intelligence(AI).In recent years,the development of deep learning has brought new development opportunities to the field of computer vision.With the in-depth research of researchers in the field of expression recognition,the types of deep learning network models applied to expression recognition have also increased.Most methods based on deep learning adopt an end-to-end recognition method,which can extract higher-order semantic features in expression images to improve the accuracy of expression recognition.In the facial expression recognition method based on deep learning,the convolutional neural network and the Transformer network are widely used.Relying on the advantages of its local receptive field and weight sharing,the convolutional neural network can fully extract the local feature information of the expression in expression recognition.Transformer relies on the self-attention mechanism to have a significant advantage in extracting global features of expressions.However,the large amount of parameters of the two networks leads to slow training speed and high requirements for hardware equipment,and both networks cannot simultaneously extract local and global expression features.In order to solve the above problems,this thesis carried out research work on expression recognition based on the convolutional neural network model and the Transformer network model.Through various improvement measures,the two different basic deep network models can simultaneously extract the local and global features of the expression image,and the model The structure is simplified to maintain the lightness of the network itself.The goal of the research is to improve the generalization ability of the expression recognition model for natural facial expression recognition from the aspects of model accuracy and parameter quantity.The main work of the thesis is as follows:1)Aiming at the lack of attention to the global expression features of the convolutional neural network,the large amount of parameters of the deep convolutional neural network leads to the limitation of its application scenarios,and the lightweight technology of the convolutional neural network will lead to a decline in the ability to extract facial expression features,etc.To solve this problem,an expression recognition network based on fusion of global features is proposed.The model is based on the Mobile Ne Xt lightweight network,and its feature extraction ability is improved.First of all,the Sand Glass module in the network can improve the transmission capability of feature information in the network and reduce the loss of expression features during transmission.Secondly,the Ghost module is used to replace the 1×1 convolution kernel in the network to reduce the parameter amount of the feature extraction layer.Use the Drop-Activation layer again to replace the Re LU layer in the Sandglass module to enhance the generalization ability and accuracy of the network.Finally,the SGE(Spatial Group-wise Enhance)attention mechanism is introduced to enhance the network’s ability to refine the expression features.The expression recognition effect on the FER2013,RAF-DB,and CK+ datasets verifies the effectiveness of the proposed expression recognition network.Compared with the basic network Mobile Ne Xt,the accuracy of the network model in the FER2013,RAF-DB,and CK+ data sets increased by 2.6%,6.5%,and 7.15%,while the Params and FLOPs only increased by 0.85 M and 2.93 M.(2)Aiming at the problem of Transformer’s insufficient ability to process local features,an improved network based on Mobile-former is proposed.Firstly,the Mobileformer is used as the basic network,so that the network model can combine local and global features when performing expression recognition;secondly,the ACmix model is introduced to replace the original stem module,so that the network can have enough for initial extraction of the input image Receptive field;Finally,this thesisproposes a more portable and efficient mobile sub-module to reduce network model parameters.The expression recognition effect on the RAF-DB and CK+ datasets verifies the effectiveness of the proposed expression recognition network.Compared with the basic network Mobile-former,the accuracy of the network model in the RAF-DB and CK+ data sets increased by 3.03% and 3%,respectively,while Params and FLOPs decreased by 1.05 M and 76.82 M.(3)This thesis designs a facial expression recognition system.The system is based on the expression recognition model and uses the MTCNN network as the face recognition network.When the facial expression recognition system is running,the system will recognize the face image in the image or video sequence through MTCNN and generate a face detection frame,then the facial expression recognition model selects the area in the face detection frame for expression recognition,and outputs Facial expression recognition results. |