Font Size: a A A

Research On English Named Entity Extraction

Posted on:2017-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y XueFull Text:PDF
GTID:2415330590988952Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Named Entity Extraction is the foundation of several natural language processing tasks,such as Machine Translation,Automatic Question and Answering,as well as Coreference Resolution.English entity extraction has been studied for more than 10 years,but the goal of entity extraction is limited in Person,Location and Organization.It is of great significance to increase the entity category and improve the accuracy of the entity recognition.In this paper,firstly,we use the entity linking method to prove that Wikipedia can not cover all entities in the real world,so it can't meet people's needs of searching and understanding of knowledge.Then we compared the performance of several models,including the Hidden Markov Model,Maximum Entropy Model,Conditional Random Fields Model,Noun phrase recognition model,Stanford University named entity recognition,Microsoft named entity recognition,Typeless entity recognition model.Through the empirical analysis of the data,it is proved that Typeless NER model,which is removed the entity label and trained by conditional random fields is the best one.In order to avoid the impact of the training data and test data's structure similarity to the experiment results,we also selected five days news data in 2014,as well as the short text data from Microsoft to validation our conclusion.The data of this paper is millions level,and it is primarily on the web documents,which covers great variety of writing styles and different signals,so the method we propose is pretty robust in different language domain.This paper provides a new attempt for the English entity extraction,the proposed model T-NER has been put into practical use now.
Keywords/Search Tags:entity extraction, conditional random fields, Typeless NER, knowledge graph
PDF Full Text Request
Related items