Font Size: a A A

Video Sequence-oriented Expression Recognition Method Based On Self-built Expression Dataset And Improved Transformer Framework

Posted on:2024-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:J H WanFull Text:PDF
GTID:2568307112976839Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Facial expression recognition is a key research direction in the field of computer vision,aiming at automatically recognizing the expressions in facial images by using computer technology.Facial expressions contain the truest emotional changes in people’s hearts,and show potential application value in the fields of emotion recognition,human-computer interaction,face recognition and so on.There are two shortcomings in the traditional video sequence-oriented expression recognition task: First,only 2D information of human faces is often considered.However,2D information can’t fully reflect the shape and features of human face,which often leads to the decline of recognition performance.Secondly,sequence learning models such as RNN or LSTM can only establish the relationship between adjacent or similar frames,and cannot achieve full correlation learning between frames.At present,the existing facial expression on video sequences datasets have many shortcomings,such as limited data volume,inaccurate labeling,difficulty in covering all ages,lack of diversity and so on.AFEW8.0 video clips are selected from different movies,but most of the expressions in the movies are performed by actors,which lacks authenticity and naturalness.In this thesis,facial expression recognition and related datasets are studied and expanded,and the main contributions are as follows:(1)Improve the existing Vision Transformer framework.At the same time,the coordinates of 3D key points of face and the features of 2D depth network of face based on expression classification are input into the Vit embedded in the attention module of the frame,and then comparative experiments are made on CK+ and AFEW datasets respectively,and the self-built dataset JXNU-Expression and AFEW datasets are ablated.The results show that the recognition accuracy and stability can be greatly improved under this framework,and the full correlation learning between frames can be achieved,and facial expressions can be accurately recognized when side faces or other rare angles appear.At the same time,it is proved that the self-built dataset is challenging.(2)JXNU-Expression,a universal dataset of facial expressions,is designed and produced.In variety shows,reality shows and documentaries,look for the most authentic and natural expression clip videos according to the context,and edit 680 positioned video sequences to remove redundant face and environmental information and keep part of the face content as much as possible,which is convenient for subsequent expression recognition;The corresponding position of the expression in the source video is recorded,and the attribute of the expression can be perceived through the context.The self-built facial expression dataset contains faces of different ages,races and genders,which enriches the diversity and is a beneficial supplement to the existing facial expression dataset.(3)The online micro-expression dataset JXNU-Microexpression about stimulating natural/inhibiting expression response was designed and produced independently.Three kinds of stimulating videos which are easy to produce expression reactions in a short time are carefully selected in the platforms of movies,documentaries and short videos.The video sequences of natural and inhibitory reactions of subjects watching the three kinds of stimulating videos are collected through the self-built collection environment and the designed micro-expression recognition system,and the collected stimulating reaction data are processed and analyzed by using the improved expression recognition algorithm and other related methods.
Keywords/Search Tags:Facial expression recognition, Expression dataset, Face 3D reconstruction, Microexpression
PDF Full Text Request
Related items