Font Size: a A A

Long Tail Entity Linking Based On Information Enhancement

Posted on:2024-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuFull Text:PDF
GTID:2568307052496174Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Entity linking is crucial for many downstream natural language processing tasks.Its function is to link entity mentions in text to the correct entity in the knowledge base,so it is often applied to scenarios such as information retrieval,intelligent question answering,and recommendation systems.In recent years,English-oriented entity linking has been greatly developed,and various methods have achieved very good results.But for Chinese entity linking,it still faces huge challenges and has not been fully explored.Compared with the rich English corpus dataset,the Chinese corpus is lacking in quantity and quality,and the data distribution usually has obvious long-tail characteristics,which affects the balance of model training.In response to these problems,this paper innovatively proposes an entity recognition module that introduces multi-task supervision information and an entity matching module that integrates rich graph information.The above modules can supplement the supervision signals and text features of long-tail entities respectively,so as to realize long-tail entity data.The implicit enhancement and alleviation of the model’s ignorance of tail-like entities.In this paper,experiments are designed to demonstrate the effectiveness of the method on long-tail entity and full-sample entity link datasets.The main work of this paper is as follows:(1)This paper constructs an end-to-end two-stage entity linking model based on pre-trained language model coding and builds a web application platform:In this paper,the entity linking process is divided into two parts: entity recognition and entity disambiguation,and an end-to-end entity linking model is established.Comparing the experiments,training and testing on two Chinese data sets,the validity of the entity linking model is verified; meanwhile,a web application platform is built,which can use and test the model and method proposed in this paper.(2)This paper reduces the impact of insufficient Chinese labeled data on entity links by introducing Chinese word segmentation tasks for multi-task supervised learning: Aiming at the problem of insufficient Chinese labeled data,since the Chinese word segmentation task and entity recognition have similar word boundary information,multi-task supervised learning enables entity recognition to learn the word boundary information of the Chinese word segmentation task,thereby reducing the impact of lack of labeled data.,to improve the performance of named entity recognition.(3)This paper proposes an entity matching method for long-tail data that uses knowledge graph information fusion:Aiming at the problem that the data in the dataset presents long-tail features,we try to build a high-frequency type dictionary of the knowledge base,and enhance the performance of entity matching by integrating the fine-grained entity type information of the knowledge base,so that the model can better match long-tail entities.To sum up,in order to solve the problem of the scarcity of Chinese annotation data and the poor effect of entity linking caused by the long tail feature of the data set,this paper proposes a two-stage entity linking model based on the pre training language model,from end-to-end entity recognition to entity disambiguation,and then uses multi task supervised learning to obtain the word boundary information of Chinese word segmentation to improve the ability of entity recognition,Finally,the external information of the dataset is learned by fusing the entity information in the knowledge map,so as to more accurately match the long tail entity references.Experiments on two Chinese data sets verify the effectiveness of the proposed model for Chinese long tail distribution data.
Keywords/Search Tags:Entity Linking, Named Entity Recognition, Long Tail Features, Multi-task Learning, Knowledge Graph
PDF Full Text Request
Related items