Font Size: a A A

Research And Implementation Of Entity Linking

Posted on:2016-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2308330482960433Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the text on the Web is exploding. How to make better use of these text and obtain more useful information is a big challenge to natural language processing. Entity Linking is a task which links the entity mention to a knowledge base entity. This task can help computers understand what does an entity really mean and it’s helpful to QA system, sentiment computing, semantic analysis, knowledge base engineering and so on.In this paper, a study on Entity Linking is done, and a method based on context information and ListNet Ranking model for English Entity Linking is proposed. The proposed method used context information for expanding the mention name, and search the Wikipedia for candidate entity names. Then feature extraction method is proposed in order to calculate the’ similarity of mention and entity. ListNet Ranking algorithms are used to rank the candidates and choose the one with higher score as the link entity. And finally, for NIL ones, use clustering method to link them. In this paper, a system called Improve2014 is implemented and tested on KBP 2013 test data, Improve2014 achieved F-score of 0.660,0.162 higher than the baseline system:BUPTTeam2013,0.092 higher than the median F-score of all the teams in KBP 2013 Entity Linking task.This paper mainly contains the following contents:Firstly, regard the mention expansion as the inverse process of abbreviation and shortening. In mention expansion, this paper summarized the acronym, abbreviation and shortening, and its regular pattern. The experimental result shows that if make full use of the context co-reference relationship, outer dictionary and Web knowledge, the mention expansion will be a simple solution. The results shows that the method which is proposed by this paper can get relatively formal name, improve the recall of candidates’retrieval and reduce the disturbance terms.Secondly, this paper did research into the representation of the entity relationship and build the entity mention-candidate pairs, extracted surface string level features, entity level features and semantic level features. The experiment of the effectiveness of the feature sets shows that every feature has a positive impact on the performance of Improve2014, and with the combination of all features, Improve2014 can obtain the best performance. The improvement achieved by adding semantic associativity feature to link probability feature is greater than that of adding one of the other two features. The uneven distribution of candidate ranking model may lead to a wrong answer, this paper will sort the list of candidates based on ListNet, achieving higher precision than Ranking SVM model.Finally, in entity clustering, this paper combines rules and K-means clustering method. Comparing with the agglomerate clustering, the combined method achieves a higher accuracy.
Keywords/Search Tags:Entity Linking, Wikipedia, Mention Expansion, Learning to Rank Algorithms, ListNet
PDF Full Text Request
Related items