| Facial expression is the way of emotional expression in human communication,which is of great significance in human-computer interaction.With the rapid development of artificial intelligence,facial expression recognition technology has a wide application prospect in the fields of safe driving,medical monitoring,online education and advertising marketing.At present,there are two main methods of video facial expression recognition based on deep learning:(1)classification of video facial expression using three-dimensional convolutional neural network.However,this method is limited by the large number of model parameters and calculation,so it is difficult to deploy in actual scenarios.(2)A cascade model combining convolutional neural network and cyclic neural network is used to extract the temporal and spatial features of videos and classify expressions through classifiers.The computational efficiency of this method is high,but how to further extract the most discriminant facial features is still a difficulty in current research.In view of the above problems,the main work of this paper is as follows:To solve the problem of too many parameters in 3D convolutional neural network,a3 d residual network model based on depthwise separable convolution was proposed.Firstly,the residual network is taken as the basic model,and the 3D residual network model is built based on the feature that 3D convolution can simultaneously extract spatio-temporal features.Secondly,the 3D depthwise separable convolution is proposed to separate the 3d convolution operation process in the model residual module,which reduces the complexity of the model.Through comparative experiments,it is proved that the introduction of the 3d depth-separable convolution model can reduce the number of model parameters and computational cost to the greatest extent while sacrificing a little recognition performance.Aiming at the problem of extracting the most discriminant facial features from cascaded models,a cascaded model of residual network and gated cyclic network based on attention mechanism was proposed.A lightweight channel attention module is embedded in Res Net18,which can effectively enhance the key feature channels of facial expressions.Secondly,adding sequential attention module into the GRU network for sequential feature extraction can improve the attention of the GRU network to key video frames.Experiments show that adding two kinds of attention modules can effectively improve the performance of the model and realize the focus of discriminant facial features.Based on the proposed algorithm model,a facial expression recognition system is designed.The system can recognize facial expression from local video and real-time video images.Through the test experiment,the recognition function of the system,under the normal light condition of 1080 P resolution,the recognition accuracy of all kinds of facial expressions can reach more than 90%.It still has more than 78.57% recognition accuracy in 720 P dim light scene.Among them,the recognition speeds of video files on the local hard disk is greater than 12.23 FPS,and the recognition speeds of real-time video files is greater than 48.02 FPS. |