| Entity disambiguation is the key technology of entity linking,which aims to eliminate the ambiguity of mentions in text sentences,so as to effectively understand text semantic information.This technology has important academic value and practical significance in practical applications such as question answering systems,reading comprehension,public opinion supervision,and knowledge graph construction and expansion.At present,entity disambiguation methods are mostly based on neural network learning feature information,focusing on mentions context information and knowledge base information,and constructing multi-type and multi-modal context and knowledge feature representations,but most of them ignore the latent semantics of other mentions in the text,entities characteristic information such as types and entity commonalities.In view of the above problems,this thesis proposes a context-aware multi-feature entity disambiguation model to improve the accuracy of model disambiguation.The main research contents of this thesis are as follows:(1)Aiming at the problems that the encoded information of entity embedding is too unique,lacks common features and insufficient local context information,an entity disambiguation sequence model that combines fine-grained semantic features and dynamic contextual semantic features is proposed.The model injects fine-grained semantic information into entity embeddings to improve semantic correlation between entities.When disambiguating the context,the previously disambiguated entity is sensed,and the model uses it as a dynamic context to enrich the local context,which is used to enhance the feature information of the context for efficient joint reasoning.(2)Aiming at the problems of incorrect prediction types of some entities,and two mentions in the same sentence linking different entities,the entity type features are further integrated in the entity embedding,and a text encoder is introduced to learn the representation of text and entities,and establish a multi-relational model of mentions.The model that induces relationships between mentions through latent variables.Since the soft/hard attention sequence model only pays attention to the previously linked entities and ignores the relationship between the current mention and subsequent entities,it is optimized as a sequence graph attention model.According to the difficulty of disambiguation of mention,priority input is easy to disambiguate and dynamically change the input nodes and relationships according to the current mention state,so as to utilize more relevant entity feature information and improve the disambiguation ability of the model.Experiments on several mainstream public entity disambiguation datasets show that the entity popularity,fine-grained semantics,entity type features and relationship features of mentions are integrated on the basis of contextual information and knowledge base description information,and use sequence graph attention network.The attention network optimizes the model,which can reduce the interference of noise information and effectively improve the accuracy of model disambiguation.Compared with the current representative DCA model,the accuracy of the model on the in-domain AIDA-B dataset is increased by 0.18%,the F1 is increased by 0.33% on the cross-domain MSNBC dataset,and the F1 is increased by 2.02% on the AQUAINT dataset.On the CWEB dataset,it increases by 1.81%,and on the WIKI dataset,the F1 increases by 2.99%,reflecting the good applicability of the model. |