| In recent years,with the rapid development of smart terminals and internet technology,an increasing number of users have been expressing their emotions on social platforms.The data on social platforms have evolved from a single modality to a multimodal form.With the widespread application of multimodal data,unimodal sentiment analysis methods are no longer able to handle multimodal data effectively and utilize the diverse information hidden within it.As a result,multimodal sentiment analysis methods have emerged and become a new research hotspot.Existing multimodal sentiment analysis models typically employ feature-level fusion,decision-level fusion,and hybrid fusion methods to integrate multimodal data.However,existing hybrid fusion methods do not simultaneously fuse three modalities during feature-level and decision-level fusion,which fails to effectively capture the interactive information between multimodal data.Furthermore,existing multimodal sentiment analysis models often assume that the multimodal sentiment data is complete,while in reality,uncertainty missing modalities are common occurrences,rendering the existing multimodal sentiment analysis models ineffective.To address these issues,this paper conducts the following research work:1)To address the issue of existing hybrid fusion methods failing to capture all modality features during feature-level and decision-level fusion,this thesis proposes a multimodal sentiment analysis model using Bi GRU and attention-based hybrid fusion strategy.The model utilizes Bi GRU to extract unimodal features and employs a bimodal attention fusion module to fuse the pairwise unimodal features.In order to capture the interactive information between all modalities during feature-level and decision-level fusion,trimodal attention fusion and trimodal concatenation fusion methods are proposed to obtain two sets of fused trimodal features.Subsequently,the two sets of trimodal features are classified,and the classification results are fused at the decision level for sentiment analysis.Experimental results demonstrate that the proposed model can effectively capture the interaction information among all modalities and significantly improve the performance of the model.2)To address the issue of uncertain missing modalities in multimodal sentiment analysis,this thesis proposes a modality translation based multimodal sentiment analysis under uncertain missing modalities.The model first utilizes Transformer encoders to extract features from three modalities and employs Transformer decoders to translate visual and audio modalities into textual modality.Then,the translated visual and audio modalities and the encoded textual modality are fused to form the missing joint feature.Subsequently,the missing joint feature is encoded,and a pre-trained model trained on complete modalities is used to supervise the encoding process,allowing the missing joint feature to approximate the complete joint feature.Simultaneously,Transformer is used to capture long-term dependencies between the multimodal joint features.Finally,the output of the Transformer encoder is used for the final sentiment classification.Experimental results demonstrate that the proposed method achieves significant performance improvement in scenarios with uncertain missing modalities. |