| As an important part in the field of computer vision and pattern recognition,dynamic expression recognition has been favored by more and more researchers.Most of the current dynamic expression recognition algorithms take the entire expression sequence as input and use neural networks to learn task-related features in a supervised manner.Although these methods can avoid hand-designed features,the source of features is the entire expression sequence,which is rich in redundant information unrelated to facial expressions.Compared with the original expression sequence,the landmark sequence of the face is a higher-level expression form,and the positional change of the landmarks in the same expression state have the same pattern.By inputting the sequence of facial landmark,the network can pay more attention to the features related to facial expression and reduce the difficulty of learning features.The main research works of this thesis are as follows:(1)In order to use graph convolutional networks to extract time-varying information in face landmark sequence,a dynamic expression recognition method based on spatio-temporal graph convolutional networks is proposed.First of all,the landmark detection algorithm is used to obtain the coordinates and labels of the landmarks of the face in the expression sequence.Then,three different ways are used to form edges between the landmarks in the same frame,which are according to the muscle distribution of the face,according to the geometric structure of the facial organs,and connecting each landmark to the rest of the landmarks.Next,edges are formed between the landmarks with the same label in the adjacent frames,and these edges and landmarks are used to form a spatio-temporal graph.Finally,a spatio-temporal graph convolution networks is constructed to classify the spatio-temporal graph.The experimental results on CK+and Oulu-CASIA datasets show that,compared with the other two construction methods,the spatio-temporal graph based on full connection has better discrimination.The recognition accuracy can reach 93.88%and 78.69%,respectively,which verifies the feasibility of this method..(2)In the spatio-temporal graph convolutional networks,all the nodes of the spatio-temporal graph fuse information according to a fixed topological structure,which cannot reflect the differences between various expressions.In order to solve this problem,the self-attention mechanism is embedded into the spatial graph convolution layer in the spatio-temporal graph convolution networks,and the self-attention mechanism is used to learn the unique topology of each expression sample.Fusion of node information according to this topology can provide additional information for the spatial graph convolution layer,making the extracted feature information more discriminative.The experimental results on the CK+and Oulu-CASIA datasets show that the recognition accuracy of this method is improved by 0.92%and 2.29%,respectively,which verifies the effectiveness of the method.(3)In order to make full use of the information in the dynamic expression sequence and further improve the recognition accuracy of dynamic expressions.Taking the spatio-temporal garph formed by the landmark sequence and the peak frame image of the expression sequence as the research object.Experiments were carried out on the CK+and Oulu-CASIA datasets using two different information fusion methods,the decision-level fusion based on weighted summation and the feature-level fusion based on feature concatenation.The experimental results show that the decision-level fusion based on weighted summation has a higher recognition accuracy,and the recognition accuracy can reach 97.25%and 90.52%,respectively,which is higher than that of spatio-temporal garph or peak frame image.In addition,compared with the existing dynamic expression recognition methods,this method also has certain competitiveness. |