Font Size: a A A

Research On Expression Recognition Based On Cross-stream Attention Mechanism

Posted on:2023-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:H X ChenFull Text:PDF
GTID:2568306836972119Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Facial expression recognition is a special and wide application in the field of pattern recognition.Due to the dynamic changes of facial expression in the temporal scale,it is of great significance to study the dynamic expression recognition combined with the texture change information of expression.Two-stream convolutional networks(Two-Stream ConvNets)is a commonly used model for dynamic expression recognition.However,the process of feature extraction in its two streams is relatively independent and the connection between the streams cannot be fully established.Thus,Two-Stream ConvNets still has a great amount of room to be improved.In recent years,different forms of attention mechanisms had been introduced into convolutional neural networks,which endows the network with the ability to learn features nonlocally,and provide a new idea for image classification and other tasks.This thesis achieves the proposal of information interaction across the two streams of Two-Stream ConvNets combined with attention mechanism.The following gives the main research works and results in this thesis:(1)In view of the fact that the feature map of one scale can not learn the dynamic correlation information between the feature map of another scale spontaneously,a cross-stream attention(CSA)mechanism with non-local operation is proposed.A CSA unit can promote two feature maps from spatial and temporal scales to learn local salient region information from each other,so as to capture the dynamic correlation information across two semantic strata in spatial domain and temporal domain,and make the features more discriminative.(2)Aiming at the lack of information interaction in the feature extraction process of the two streams in traditional Two-Stream ConvNets,the CSA unit is introduced into Two-Stream ConvNets,then the model of two-stream convolutional networks with cross-stream attention(TSCN-CSA)is proposed.The model establishes the dynamic relationship across the two streams,which can improve the accuracy and robustness of the model for expression classification.In this thesis,VGG-16 and ResNet-50 are used as the backbone of the proposed model respectively.By embedding several CSA units into different positions of the model,the information interaction across the features from two scales can be achieved at any stage of the network.(3)The model of TSCN-CSA is used to classify the samples of two dynamic expression datasets,eNTERFACE and AFEW,to prove the effectiveness of the model in the task of facial expression recognition.The experiment is divided into two parts: analyzing the performance of the model of traditional Two-Stream ConvNets and analyzing the performance of the model of TSCN-CSA.The results show that embedding CSA units into Two-Stream ConvNets can improve the classification accuracy of expression samples.Among them,the accuracy of TSCN-CSA based on ResNet-50 for classifying the samples in eNTERFACE or AFEW datasets can reach 55.83% or54.57% respectively,which is 3.33% or 2.09% higher than that of traditional Two-Stream ConvNets.(4)A dataset named Dynamic Facial Expression of Pain in Neonates(DFEPN)is established,in which the samples are labeled as "Calmness","Crying","Moderate pain" or "Severe pain".The model of TSCN-CSA is used to conduct multiple ablation experiments on the samples of DFEPN dataset,and the classification accuracy is higher than those of other commonly used expression recognition models.Among them,the classification accuracy of TSCN-CSA based on ResNet-50 reaches 66.20% in neonatal pain expression recognition based on four categories.
Keywords/Search Tags:Expression recognition, Cross-stream attention, Deep learning, Convolutional neural networks
PDF Full Text Request
Related items