Font Size: a A A

Research On The Named Entity Recognition And Base Noun Phrase Identification

Posted on:2011-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:W X TanFull Text:PDF
GTID:2155360305976429Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Named Entity Recognition is the task of classifying phrases that denote certain types of named entities in a document into some predefined categories, while Base Noun Phrase Identification is to find the noun phrases without recursive structure or post -modifiers in the discourse. Both tasks are regarded as foundational steps toward Text Processing, and are substantially significant to deep natural language processing applications. As another crucial issue in deep language processing, Coreference Resolution has been drawing more and more attention due to its importance in NLP tasks, such as Machine Translation and Information Extraction. However, the performance of Coreference Resolution highly depends on the performance of Named Entity Recognition, Part-of-Speech and etc.On the analysis of the related work in this literature, the paper focuses on Named Entity Recognition and Base Noun Phrase Identification. The contribution of this work includes:Firstly, this paper takes the characteristics of the nesting named entity structure into consideration, and adopts a cascaded CRF model for Named Entity Recognition task. The person names as well as the simple location and organization names are first recognized by the lower model. The best results are then passed to the high model, and support the decision of high model for recognition of the complicated location and organization names. The recognition is improved in consequence of the cascaded model.Secondly, this paper has addressed the issue of Base NP Identification via an error-based cascaded model approach. The atomic features are combined to comprise context information as much as possible, and are then experimentally evaluated. The results show that the combined classifiers with two levels are more effective than the approaches of one-classifier- only.Finally, this paper has made a preliminary investigation of taking the optimal recognition of the above two tasks into the SVM-based Chinese Coreference Resolution Platform, instead of the rule-based preprocessing. The evaluation results on ACE 2005 Chinese Corpus show that our work can improve the system significantly.
Keywords/Search Tags:Coreference Resolution, Named Entity, Base Noun Phrase, Cascaded Conditional Random Fields Model
PDF Full Text Request
Related items