| A meme is a combination of image and text,usually with words embedded in the image,and is widely used in social media due to its diverse content and rich emotional features.Memes are not only a medium for conveying information or disrupting sociopolitical situations,but also a major source of sharing humor and laughter,which have become an integral part of everyday life,they play a vital role in people’s sociopolitical,cultural,and behavioral contexts.Although memes often mean sarcasm or humor,the spread of offensive,threatening,and hateful memes still exist on social networks so their automatic detection will help reduce harmful social impacts.Sentiment analysis of memes is challenging due to their multimodal nature.In this thesis,the composition characteristics of memes,and the difficulties in three aspects of image feature extraction,text feature extraction,and feature fusion are deeply studied.The research content of this thesis will be described from the identification model of memes to the sentiment analysis model of memes.For the task of meme detection,this thesis proposes a model based on the singlestream model architecture of UNITER,the model can effectively extract key features such as image layout,content,etc.,and at the same time,the input text is processed to obtain text features,and use the self-attention mechanism to combine the input of image and text to make fine-grained alignment between words and image regions,so as to improve the effect of feature fusion.For the task of meme sentiment analysis,this thesis proposes a meme sentiment analysis model based on a two-stream model architecture of cross-modal attention,the model can jointly pay attention to the information of different representation subspaces of the image and has a stronger representation ability,besides,by combining the global information of the language vocabulary and the context information of the sentence,the semantic information of the corresponding sentence can be better obtained,and the dual-branch multi-head cross-modal attention mechanism is used to fuse the image and text features,so that the text and the corresponding image can fully interact,which is beneficial to the distribution of the weights of each modality.For the meme detection task,this thesis selects the Dank Memes 2020 Meme Detection dataset as the research object,and for the meme sentiment analysis task,this thesis selects the Dank Memes 2020 Hate Speech Identification and Troll Meme classification in Tamil datasets as the research object.The methods used in this thesis have achieved the best results in their respective experimental tasks,which fully demonstrates the effectiveness of the methods. |