Font Size: a A A

Research On Multimodal Entity Linking Method For Short Text

Posted on:2022-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:P Y LiFull Text:PDF
GTID:2558307070952489Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the booming of social media such as Facebook,Twitter,and Weibo,the number of social media posts has increased massively,and because such posts are mostly in the form of short texts,they bring new opportunities and challenges to the entity linking task.Entity Linking(EL)is a fundamental task of natural language processing,which can support information extraction,intelligent question and answer,and knowledge base supplementation by linking ambiguous mentions in unstructured text with correct entities in the knowledge base.Specifically,the entity linking task for short texts faces the following challenges:(1)short texts and not rich semantics of mention contexts;(2)various forms of mention expressions,and simple exact matching cannot realize entity linking;(3)spoken expressions and noisy mentions;(4)implementation of multimodal entity linking represented by image information.To address the above challenges,this paper first investigates the two subtasks of entity linking-candidate generation and entity disambiguation respectively,proposes a multi-strategy candidate generation method for short texts and a multimodal entity disambiguation method for short texts,and then proposes a multimodal entity linking method for microblogs for the task of microblog entity linking.The main work of this paper is as follows.(1)Research on multi-strategy candidate generation method for short texts.A multistrategy candidate generation method(MSCG)for short texts is proposed to address the problems of rich and noisy representations in short texts.The method first combines three strategies to generate candidate sets with Wikipedia,the heuristic algorithm of n-grams,and the improved Levenshtein algorithm to improve the recall of short text candidate generation.Then,we combine two methods of filtering candidate sets,the fallback mechanism and multi-feature candidate ranking,to reduce the number of entities in the candidate set while ensuring a high recall rate,and improve the overall efficiency of entity linking.(2)Research on multimodal entity disambiguation algorithm for short text.The multimodal entity disambiguation algorithm(TMED)with fused topics is proposed for social media posts with short text length and carrying images.On the one hand,the method starts from the perspective of local disambiguation,and completes local entity disambiguation by obtaining multimodal representations of mentioned text,character and image features through multimodal representation learning network;on the other hand,it starts from the perspective of global disambiguation,and completes global entity disambiguation by using entity concept to reflect the mentioned contextual topic information,which compensates for the short textual contextual semantic deficiency.The final multimodal entity disambiguation is accomplished by combining local disambiguation and global disambiguation,and the experimental results show that the TMED algorithm proposed in this paper has a better disambiguation effect on multimodal datasets.(3)Research on the multimodal entity linking algorithm for microblogs.For the problem that Chinese microblogs have more diverse representations and contain noisy images,the microblog-oriented multimodal entity linking algorithm(WB-MEL)is proposed.To address the problem of diverse expressions in Chinese microblogs,a candidate generation method based on pinyin is proposed to improve the quality of text entity links by combining the characteristics of Chinese pinyin;to address the problem of noise caused by irrelevant images in microblogs,irrelevant images are filtered by text-image correlation analysis in the multimodal joint disambiguation stage.Experiments on the Weibo-MEL dataset demonstrate that the WB-MEL algorithm in this chapter works better for entity linking on the microblog dataset.
Keywords/Search Tags:Short Text, Multimodal, Entity Linking, Candidate Generation, Entity Disambiguation
PDF Full Text Request
Related items