Font Size: a A A

Multimodal Sentiment Analysis Based On Attention Mechanism

Posted on:2023-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y F SongFull Text:PDF
GTID:2568306746481314Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the progress of science and technology,most people have intelligent machines or intelligent terminals.Taking intelligent voice assistant as an example,only by making a rapid and accurate assessment of human emotions can we fully understand human emotions and intentions and produce a more intelligent response.In smart medicine,the intelligent recognition of patients’ emotional state is helpful to relieve patients’ emotions and help doctors improve treatment plans.In intelligent transportation,the emotional state of drivers is monitored in real time.If abnormal conditions are found,such as drunk and tired driving,reminders will be sent at the first time to effectively avoid traffic accidents.In the era of artificial intelligence,humancomputer interaction is becoming more and more common,and users have higher and higher experience standards for intelligent machines.People hope that machines can observe and understand their emotions like humans.Emotional analysis is an important research direction in the field of human-computer interaction.Emotional analysis is the process of analyzing,processing,summarizing and reasoning subjective data with emotional color.In the traditional emotional analysis research,the main medium is text.In recent years,with the rapid rise of multimedia network,both content creators and ordinary Internet users increasingly use video to express their emotions,The video contains the facial expression and voice of the characters,which has richer emotional information than the text.Emotion analysis based on multimodal data has great advantages.In the future,artificial intelligence will be the key to economic growth.Countries all over the world are competing fiercely in this field.In this context,this study takes multimodal sentiment analysis as the main research object,and uses the attention mechanism and its variants to fully capture the interactive information between multimodal data.The main work and contributions are as follows:1.Aiming at the problem of intra-modality feature representation and inter modality feature fusion in multimodal sentiment analysis,this paper proposed a multilevel hybrid fusion multi-modal sentiment analysis model based on attention mechanism and multi-task learning.Firstly,the method used convolution neural network and bi-directional gated unit to extract the single-modality internal feature;Secondly,the method used the cross-modality attention mechanism to realize the pairwise feature fusion between modalities;Thirdly,the method used the self-attention mechanism to select the modality contribution at different levels;Finally,combined with multi-task learning,the method obtained both sentiment and emotion classification results.The experimental results on the CMU-MOSEI dataset show that this method can improve the accuracy and F1-score of sentiment and emotion classification.2.A multimodal sentiment analysis model based on cross modal attention and double pooling is proposed.The dynamic relationship between modalities is modeled by using the cross modal attention mechanism.Combined with maximum pooling and average pooling,the salient features and overall features in feature interaction information are mined to realize sentiment classification.The experimental results on CMU-MOSEI data set show that the proposed model is superior to the benchmark model in accuracy and F1 value.3.A sentiment analysis method based on large-scale pretraining text and audio model is proposed.The pretraining models from different fields are modal fused,and ALBERT(text)and PANNs(audio)are used to train the model from the raw text and audio in an end-to-end manner.The accuracy and F1-score of CMU-MOSI dataset reached 84.98% and 85.08% respectively.4.Sarcasm and humor are two widely used figurative language tools in human communication,which makes them important problems in NLP.A huge amount of social media content was produced daily,including sarcastic and humorous content in the form of multimodal.Prior work has analyzed sarcasm and humor as two separate issues.However,the relationship between sarcasm and humor on social media is largely unexplored.To this end,we focus on a subset of social media,i.e.,TV talk shows,and propose a multimodal Chinese sarcasm and humor dataset with 6124 annotated data points with text and the corresponding audio and video clips.Furthermore,we propose Multimodal Multi-gate Mixture-of-Experts(MMMo E)where we combine selfattention and co-attention to model intra-modality and inter-modality dynamics,respectively.Detailed analysis and experiments are conducted on the dataset.Experimental results show that MMMo E outperforms baseline fusion models on sarcasm and humor detection.
Keywords/Search Tags:Attention mechanism, Multimodal sentiment analysis, Implicit sentiment, Transfer learning, Data set
PDF Full Text Request
Related items