| This thesis mainly studies facial expression recognition based on video sequence.Firstly,this thesis proposes a statistical feature extraction and classification algorithm based on mixed expression input.The innovation of the algorithm is that the feature of the video sequence is extracted by using the expression recognition model trained on the static expression database,then the statistical information of the feature is calculated and the expression is classified by using the Linear SVM classifier.In addition,this thesis also proposes a hybrid expression network input mode,which is composed of pre-processed gray-scale face images and face LBP atlas.In order to make better use of temporal features in facial expression video,this thesis proposes an end-to-end deep spatiotemporal facial expression recognition network FP-VGGGRU,which takes full account of the contribution of features at different network levels to facial expression recognition.The network is simple in structure,easy to expand,and has good dynamic expression recognition effect.Finally,the method of model fusion based on weighted voting mechanism is used to complete the prediction of facial expression in video sequence by combining the spatial information of facial expression pictures with the temporal information of video.In the process of model training,this thesis mainly uses Softmax loss and Island loss to learn facial feature representation,and analyses the importance of dynamically adjusting the proportion of different loss functions to improve the effect of facial expression recognition.At the same time,a data enhancement method of sequential reordering is proposed to improve the performance of the algorithm.Experiments on the dynamic expression dataset AFEW7.0 show that the proposed fusion model achieves good recognition results. |