| As an important subtask of information extraction,event extraction has significant research value.With the development of multimedia technology,multimodal event extraction has become increasingly popular in application scenarios.Therefore,the task of multimodal event extraction is gradually emerging.However,current research on event extraction mainly focuses on text modality,while research on multimodal event extraction is relatively rare.In view of the current research situation of multimodal event extraction,this dissertation conducts research from three aspects:construction of multimodal event corpus,multimodal event extraction with fusion of fine-grained image information,and multimodal event extraction based on multi-layered information.The specific content is as follows:(1)Construction of Multimodal Event CorpusCurrently,the research on multimodal event extraction is relatively rare,and the corpus for multimodal event extraction is relatively scarce.To address this issue,this dissertation constructs a multimodal event corpus(TIE).First,the content to be labelled is determined.Then,in order to improve the speed and quality of labelling,a detailed labelling process and labelling criteria are formulated,labelling tools are developed,and quality assurance strategies are proposed.Finally,this dissertation annotates a multimodal event corpus and achieves labelling consistency with an average kappa value of 0.68.The preliminary experiments on TIE have verified the availability of the corpus.(2)Multimodal Event Extraction with Fusion of Fine-grained Image InformationPrevious research on multimodal event extraction has focused on the use of image information at a coarse-grained level,ignoring the importance of fine-grained image information.Visual objects,as fine-grained image information,usually have corresponding relationships with event arguments.In addition,certain categories of objects are often associated with certain types of events,and visual objects can often indicate the occurrence of an event.Therefore,visual object information plays an important role in event extraction.Based on this,a multimodal event extraction model with fusion of fine-grained image information is proposed in this dissertation.In addition to textual features,this model uses the interactive attention mechanism to obtain coarse-grained image features and fine-grained object features from images,making full use of image information for event extraction.Experimental results on TIE demonstrate the effectiveness of this model.(3)Multimodal Event Extraction based on Multi-Layered InformationIn previous work,the information interaction between the two modalities did not fully exploit the contextual information of the image and text,resulting in the model being unable to handle instances where the image and text information did not match.To address this problem,this dissertation proposes a multimodal event extraction model based on multi-layered information.This model uses the attention mechanism to guide the interaction of information between two modalities,allowing the model to focus on more obvious common information between images and text.In addition,the model uses GCN to facilitate the deep-level interaction of information between images and text,allowing the interaction between the two modalities to fully exploit contextual information,which enables the model to better cope with situations where image information and text information do not match.Experimental results on TIE demonstrate the effectiveness of the model.In conclusion,this dissertation constructs a multimodal event corpus(TIE)and proposes effective multimodal event extraction models for related problems,making some attempts for the research of multimodal event extraction. |