Font Size: a A A

Research On Emotion Recognition Method Based On 2D-3D Gait Features In Videos

Posted on:2023-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YuFull Text:PDF
GTID:2558307061953509Subject:Pattern recognition and intelligent system
Abstract/Summary:PDF Full Text Request
Emotion plays a vital role in our daily life.Nowadays,how to make machines learn to perceive human emotion has become one of the most popular research directions.Due to the subjectivity of emotion expression,it is difficult to find an accurate quantitative method for machines to do emotion recognition.As one of the most basic human behaviors,daily gait also contains rich emotional information.Gait features have three main advantages to be an appropriate medium for emotion recognition,including be recognized from a distance,difficult to disguise,and easy to record.In this paper,we use gait features from videos as the breakthrough point for emotion recognition.At present,emotion recognition based on gait features is facing various challenges.First,the inferior accuracy of 3D gait information obtained from monocular images.Accurate gait features are the basis of emotion recognition,but it is difficult to extract 3D gait features directly from images due to the interference of camera angle changes and human body occlusion.Second,gait emotion recognition network is complicated for establishing.Emotion recognition is a frontier research field,and there are few research results for reference,so the algorithm framework for emotion recognition needs to be developed.In addition,emotional states need to be obtained from continuous frames of images,so how to construct a network that can read spatial and temporal information is also a difficulty in researches.Third,emotion datasets are sorely lacking.Large sample size and high-quality datasets can improve the precision of the network,but there are very few public emotion datasets,especially gait emotion datasets,which makes the progress of the research more difficult.In view of the above difficulties,this paper first extracts 2D gait features from the monocular images,then transforms the 2D gait features into 3D gait features,and finally puts the 3D gait features into emotion recognition network.Specific contributions are as follows.(1)Construct a 2D gait feature extraction network based on monocular images.Aiming at the problem that gait feature extraction is prone to occlusion interference in the past,this paper uses Mask R-CNN network structure to classify,detect and segment in parallel to ensure the accuracy of gait key point extraction.To solve the problem of inaccurate boundary of the region of interest,ROI Align is used to adjust the boundary and improve the accuracy of the network.Finally,the network reaches an accuracy of 63.1% in COCO2017 dataset,which provides a good foundation for subsequent 3D gait feature extraction.(2)Construct a 3D gait feature extraction network based on video images and 2D gait features.To improve the inaccurate result on supervised learning network from videos to 3D gait features,a semi-supervised learning method is used by taking 2D gait features as the supervision of the intermediate process.In the propose to solve the problem that time information is difficult to obtain in video images,time dilated convolution of different scales is used to increase the receptive field of the network from 3 frames to 81 frames,which could help to promote time feature extraction.Finally,the error on Human3.6m dataset is only47.7mm,which provides appropriate 3D gait features for subsequent emotion recognition.(3)Construct an emotion recognition network based on 3D gait feature sequence from videos.To solve the difficulty of associating spatial and temporal information in gait feature sequence,ST-GCN module is used to learn spatio-temporal information of gait skeleton simultaneously.In order to overcome the problem of large differences among individuals in emotion recognition,the method of multi-task learning is adopted to combine emotion recognition with identity recognition,and the accuracy of emotion recognition is further improved through the common features between the two tasks.Finally,the accuracy of 87.2%is achieved on the self-built Gait Emotion dataset.(4)Construct a large-scale gait-emotion dataset named Gait Emotion.It contains 1007 gait videos with emotion annotation and provides reliable data support for future research.
Keywords/Search Tags:gait feature, emotion recognition, semi-supervised learning, spatio-temporal convolution, muti-task learning
PDF Full Text Request
Related items