Font Size: a A A

Research On Multimodal Dimensional Emotion Recognition

Posted on:2023-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ChenFull Text:PDF
GTID:2568306779489064Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology and deep learning,emotion recognition is an important part of the field of human-computer interaction,and gradually become the focus of research.The purpose of carrying out in-depth research is to make the man-machine system more humanized and intelligent in the emotional interaction.The system has the ability to analyze,understand and learn from human emotions,and.Based on the current research achievements in the computer field,this paper proposes a dimensional emotion recognition method based on multi-stage hybrid fusion,and conducts unimodal,bimodal and multimodal experiments on two public datasets,IEMOCAP and MSPIN,and compares it with comparison of current state-of-the-art baseline methods.The specific work is as follows:(1)In bimodal dimensional emotion recognition,there is a defect that the prediction performance is not high due to incomplete information.In order to deeply understand the rich emotional content contained in human communication,and motion capture data lacks research on dimensional emotion recognition,this paper combines the motion capture data that express emotion through non-verbal ways on the basis of speech and text features to infer the expressed emotion.According to the characteristics of motion capture data,modeling is based on 1D-CNN and 2D-CNN networks.In the model training stage,a multi-task learning mechanism is combined to simultaneously predict the relationship between the three emotional dimensions of valence,arousal and dominance and their true emotional labels.The purpose of the above process is to improve the feature quality to ensure the prediction performance of decision-level fusion based on machine learning regression algorithm in the next stage.The experimental results show that although the performance of motion capture data in single-mode experiments is not ideal compared with speech and text,the addition of motion capture data on the basis of dual-mode experiments is more helpful to improve the value of effective price dimension in the three emotional dimensions.(2)How to achieve a complete representation of the interior of the modality and choose the best fusion method to fuse the features of different modalities are the current challenges faced by researchers.In order to solve the problem that the traditional decision-level fusion method cannot consider the consistency and correlation between different modal features,a hybrid fusion method is proposed for multi-modal emotion recognition.First,a concatenation network is constructed by combining LSTM,Bi LSTM,and 2D-CNN networks with dense layers through the concatenate method.Then,the single-modal features and concatenated features obtained after training through the deep learning network are used as the input of the SGD model.The model applies regression analysis to map the input data to the given labels to obtain the final predictions.Experimental results show that the fusion method can effectively improve task prediction performance and outperform feature-level and decision-level fusion.(3)On the basis of multimodal experiments,further research was carried out based on other machine learning regression algorithms to prove the rationality of the choice of regression algorithm.Finally,in order to make full use of the output data of the SGD model,a multi-stage fusion strategy is proposed.The output data of the current stage of the SGD model is used as the input data of the next stage of the SGD model.The values of the three dimensions of the dimension and the CCC mean are improved.
Keywords/Search Tags:Dimensional emotion recognition, Stochastic gradient descent, Multimodal, Feature fusion, Motion-capture
PDF Full Text Request
Related items