Font Size: a A A

Research On Micro-expression Recognition Algorithms Based On Spatio-temporal Analysis With Deep Learning

Posted on:2023-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:G Q SunFull Text:PDF
GTID:2568306620484654Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Micro-expression is facial expression that people produce spontaneously and then suppress actively when they receive stimulus information.Micro-expression can reflect people’s real emotional states objectively so that micro-expression recognition has great application value in plenty of industries such as national defense and security,rehabilitation of prisoners and psychological therapy.Due to the weak movement and short persistance of micro-expression,it is not easy to extract effective features of micro-expression.With the rapid advancement of deep learning,the micro-expression recognition models that are based on deep learning have better recognition results than the traditional micro-expression recognition methods,but they also encounter new problems and challenges.Owing to the motion and time characteristics of micro-expression,the extracted features from micro-expression contain a lot of unnecessary information,which makes those micro-expression recognition models not robust.Because of the difficulty in collecting and labeling micro-expression samples,samples that can be used for micro-expression research are few and unbalanced,which makes those models easy to encounter problems such as overfitting and so on.To address the above problems,this paper conducts a research on micro-expression recognition based on spatio-temporal analysis with deep learning,and its main contributions are as follows:Firstly,a micro-expression recognition algorithm based on the separation of facial muscle movement features and identity features is proposed.A bilateral-branch neural network is constructed,one of which is the micro-expression discrimination branch network named MENet and the other is the identity discrimination branch network named IDNet.A diverse attention operation module is added to MENet and a divergence loss function is constructed to make each branch of the attention operation module focus on different facial regions as many as possible,which helps MENet to extract more comprehensive facial muscle movement features.A global attention module is added to IDNet to help it pay attention to important identification information such as face shape and contour.The spatio-temporal motion features extracted from the MENet and the identity features extracted from IDNet are simultaneously sent into the mutual information neural estimator,resulting in mutual information loss which makes that the facial muscle motion features of the micro-expression are separated from the identity features.The algorithm has several training stages,and uses the combined loss function consisting of cross entropy loss function,divergence loss function and mutual information loss function to train MENet,so that MENet can extract more pure,more comprehensive and more discriminative micro-expression spatio-temporal motion features.The algorithm has been tested on SDU database,MMEW database,SAMM database and CASME Ⅱ database,and has obtained competitive accuracy.The validity of each branch,each module and each loss function has been verified via the experiments.Secondly,an algorithm for micro-expression recognition combining of contrastive learning and cross-modal quintuplet loss are proposed,and an Action Unit(AU)mask generation network named MNet is constructed.The AU mask generated from MNet will be multiplied with samples’sequence to realize the initial spatio-temporal attention to reduce the redundant information of samples.According to psychologist Plutchik’s interpretation of the internal relationship of emotions,the hard types of micro-expression corresponding with specific micro-expression in training are identified,which sets a foundation for the proposal of cross-modal quintuplet loss function.A Micro-expression sample is firstly taken as an anchor sample,and a sample of the same category with the anchor sample from other micro-expression samples is selected as a positive micro-expression sample,and a sample of the hard type of the anchor sample is selected as a negative micro-expression sample,and a macro-expression sample of the same category with the anchor sample is selected as a positive macro-expression sample,and a macro-expression sample of the hard type of the anchor sample is selected as a negative macro-expression sample.The five samples are combined to construct a cross-modal quintuplet group which used to construct the cross-modal quintuplet loss function.The group enlarges the number of samples in training,improves the speed of network convergence and enhances the representational capability of features.The RGB mode and optical flow mode of micro-expression samples are taken as two sensory views of micro-expression.Positive and negative sample pairs are constructed from the RGB view and optical flow view respectively and the comtrastive learning is carried out to calculate the contrast loss,so that the model could extract the general spatio-temporal motion features between RGB view and optical flow view.The RGB features and optical flow features are fused and then sent to the full connection layer for classification.The algorithm uses the cross entropy loss function to train the MNet,and uses the joint function consisting of cross entropy loss function,cross-modal quintuplet loss function and contrast loss function to train the feature extraction modules.The algorithm takes CK+database as the macro-expression database.Sufficient experiments are carried out on SDU database,MMEW database,SAMM database and CASME Ⅱ database.The competitive recognition accuracy verifies the validity of each loss function and module.
Keywords/Search Tags:Micro-expression recognition, Deep learning, Attention mechanism, Contrastive learning, Cross-modal quintuplet loss
PDF Full Text Request
Related items