Font Size: a A A

Research On Key Technologies And Applications Of Automatic Micro-expression Recognition

Posted on:2024-07-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:S R ZhaoFull Text:PDF
GTID:1528306932457714Subject:Computer application technology
Abstract/Summary:
As a typical micro-reaction under psychological stress,micro-expressions(MEs)are spontaneous and unconscious facial reactions that usually occur in stressful situations,accompanying individuals’ attempts to conceal their true emotions and intentions.Compared to ordinary expressions,MEs have an extremely short duration(0.065s to 0.5s),occur only in local regions of the face,and have subtle movements that are difficult for human eyes to perceive.However,MEs are spontaneous and cannot be faked,making them valuable for revealing an individual’s true feelings and intentions.As a result,recognizing MEs accurately has important applications in various fields such as criminal investigation,clinical psychology diagnosis,and business negotiation.In recent years,micro-expression recognition(MER)has successfully attracted the attention of scholars in applied psychology and computer science,who have proposed various meaningful methods for MER.Among them,the field of applied psychology mainly trains individuals to recognize MEs.However,manual analysis of MEs is time-consuming and labor-intensive,with low recognition accuracy.Therefore,there is an urgent need to leverage the powerful perceptual and computing capabilities of computers to automatically recognize MEs.Recently,although various automatic MER methods have emerged in the fields of computer vision and affective computing,these methods do not adequately meet the requirements of practical application scenarios and face numerous challenges.These challenges mainly include five aspects:1)ME data is insufficient in terms of sample size,leading to severe overfitting of MER models with deep learning;2)the ME reaction is extremely short,requiring the use of high-speed cameras for capture,resulting in spatiotemporal redundancy in the data;3)the key apex frames contain the main emotional information,but they are easily confused with noise,making accurate detection difficult;4)the movements of MEs are local and subtle,making it difficult to learn the recognizable spatiotemporal features for MER;5)diverse ME data is scarce and difficult to collect,resulting in weak generalization performance of MER models.To address the above challenges,this study primarily explores the representation of key spatiotemporal information in ME videos,apex frames spotting,local spatiotemporal dynamic features learning of MEs with limited training samples,and enhanced recognition based on micro-expression generation.The goal is to build a MER model with high accuracy and good generalization performance.The main research contents and contributions of this paper are as follows:Firstly,we propose a novel MER method based on a siamese 3DCNN network(MERSiamC3D)and a two-stage learning approach,aimed at extracting the key spatiotemporal information of original ME videos and learning spatiotemporal ME features with limited training samples.Specifically,to address the problem that the original ME videos captured by high-speed cameras are redundant and noisy,which is not conducive to learn the subtle spatiotemporal motion features,we propose a key-frames sequence construction method based on adaptive sampling and apex frames,and estimate dense optical flow to represent the original ME.Furthermore,inspired by the way human infants learning new things,we decompose the process of the model learning ME spatiotemporal features into two stages:the prior learning stage and the target learning stage,to solve the problem of lacking sufficient training samples for model training.In the prior learning stage,we construct positive and negative samples as model inputs,and let the designed MERSiamC3D acquire the ability to perceive general visual features of MEs by finishing the task of discerning"yes or no".Based on this,in the target learning stage,we further fine-tune the network structure and utilize original ME samples to continue training the model and finish target classification.Experimental results on three publicly available spontaneous ME datasets demonstrate that the proposed method’s recognition performance is significantly better than the baseline models.Besides,the rationality and effectiveness of each module of our method are also discussed in detail through ablation experiments.This study provides a new idea for how to learn models with scarce data.Secondly,we propose a novel MER method that combines apex frames spotting and prototypical attention network.Specifically,in response to the current MER methods’ dependence on manually annotated apex frames and the problem of existing apex frames spotting algorithms being easily affected by noise interference and inaccurate detection,we propose a Unimodal Pattern Constrained(UPC)-based apex frames spotting method and use the detected apex frames to extract ME key-frame sequences with RGB format for model training.In addition,since the simple induction bias reflected by the prototype network can help the model learn better from small-scale data situations,we propose a deep prototype learning framework called ME-PLAN(Prototypical Learning with Local Attention network)for MER.In particular,the ME-PLAN model we designed consists of a 3D residual prototype network and local attention:the former aims to learn accurate ME feature representations(i.e.,ME prototypes)through expressions-related knowledge transfer and episodic training,while the latter enhances the model’s perception of local motions of ME by introducing visual attention mechanisms.Finally,through extensive qualitative and quantitative analysis experiments on a composite database,we demonstrate the effectiveness and superiority of the proposed apex frames spotting algorithm and MER method.In particular,the ME-PLAN proposed in this study can achieve competitive recognition performance even without relying on time-consuming optical flow sequences as input.Finally,to address the problem of lacking diverse ME data to support the training of MER models with strong generalization performance,we propose a micro-expression generation(MEG)method based on Thin-Plate Spline(TPS)and relative action units(RAU)from the perspectives of MEG and data augmentation.We generate a large amount of reliably labeled ME training data to improve the accuracy and generalization of MER models.Specifically,we first adopt TPS to model the nonlinear motion transformation of MEs,and introduce relative AU vector of the source ME with respect to the target face as a generation condition during the MEG.This allows the generation model to focus on the facial region where MEs occur,while ignoring expression-independent movements,thereby generating fine-grained MEs.Subsequently,we utilize existing labeled ME samples and collect sufficient and easily accessible facial templates to drive our MEG model to synthesize a large amount of ME data with different genders and facial features from East and West for training data augmentation.In the end,in terms of MEG,we conduct generation experiments based on the rules of the Micro-Expression Grand Challenge(MEGC2022)@Generation track and validate the effectiveness of our MEG method through qualitative and quantitative evaluations.Moreover,in terms of MER,we adopt the generated ME data to expand three existing public ME datasets and conduct MER with three-classification experiments,improving the recognition accuracy and generalization of existing recognition models.Notably,the MEG method proposed in this study achieved the runner-up position in the MEGC2022@Generation track.
Keywords/Search Tags:Spontaneous Micro-expression Recognition, Micro-expression Spotting, Micro-expression Generation, Deep-Learning, Few-shot Learning
Related items