| With the rapid iteration of computer hardware and the continuous innovation of artificial intelligence technology,modern society has put forward higher requirements for real-time human-computer communication.As facial expressions convey most of the emotional information in daily communication,Facial Expression Recognition(FER)has also been used in more and more human-computer interaction applications,such as Intelligent Assisted Driving,Smart Medical Care and Intelligent Robot.However,there are some subjective problems in the field of facial expression recognition,such as the complex structure of some existing research algorithms and the huge number of model parameters,which are not conducive to the application of the embedded end.In addition,objective factors such as small training samples,uneven quality and uneven sample labels of facial expression data sets are not conducive to the training of large models.Therefore,how to make the computer quickly and accurately identify facial expressions is the focus of current research.In view of the above existing problems,this thesis proposes two lightweight expression recognition models.Firstly,a new FER model named AARFNet was proposed to adaptively adjust Receptive Field after improving Mobile Net v2.The network’s ability to refining features was enhanced by integrating CBAM attention mechanism in the Inverse Residual structure;SK Unit and Max Pooling were designed in Linear Bottleneck of AARFN,which helped to adaptively regulate the Receptive Field,and efficiently extract different scale features;An optimal size of model structure had been attained by exploring the width of convolution for lightweight experiments.The model achieved good results in commonly used public datasets,with the accuracy rate of 99.79% in Oulu-CASIA-VIS(Weak)dataset.It had also showed good results on occluded expression datasets.Secondly,after analyzing the shortcomings of Efficient Net v1-B0 network,a new FER method named MECANet was proposed,which was based on multi-scale efficient channel attention.In order to make up for the defects of multi-scale feature extraction and integration in Efficient Net v1-B0,multi-scale feature fusion structure was designed,which improved the granularity of extracted features;After applying ECA attention mechanism to optimize the structure,the ability to learn effective channel features had been enhanced;The amount of network parameters cut down by 26.08% with the Width Coefficient of 0.5.The new method balanced the lightweight and high recognition rate of the model,and achieved good results in the commonly used public data sets and even in occlusion expression data set,with the accuracy of 98.98% in the CK + data set.Finally,the performance of the above models was further verified with the Ablation study on the JAFFE dataset.Furthermore,they were also compared with state-of-art technology in CK+,JAFFE and Oulu-CASIA-VIS(three different light intensity).The experimental results showed that they were higher on accuracy,simpler on model structure,and less on model parameters,which,in turn,fully verified the effectiveness,universality and superiority of these two approaches proposed in this thesis. |