Font Size: a A A

Study On Chinese Name Entity Recognition And Some Related Issues

Posted on:2011-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:X D YangFull Text:PDF
GTID:2178330332972250Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Natural language named entity is an important language units in the information-carrying.Chinese named entity recognition is a key Chinese information processing technologies. with a wide range of applications in various fields, Chinese named entity is both a basic research and is also a high value of the application.At present, the Chinese name recognition problem is encountered:a complex structure named entity recognition of bad and the complex named entity recognition of many factors,although many methods, there are various shortcomings still no good solution.Present research on named entities in the person, location name recognition of the studies undertaken is well, but research on the organization name recognition is not ideal,Facing complex institutional structures, how to obtain the features and use these characteristics to the appropriate model to solve the recognition problem of organization name as well as to improve the deficiencies of existing methods is particularly important.First, this paper give cascaded CRF model approach, to deal with text as a series of observations, using the value of the low-level CRF model is a preliminary observation of person name and location names to identify, and then use the results of recognition as a high-level introduction to the high-level observations furtherinstitutional organization name recognition, while the layering between the CRF model approach to the fusion rule has a regular feature of the named entities;facing CRF model training, we introduced a fast training methods into the issue of time-consuming.Then, we analysis complex organiztations error example and pointed out the factors,one of the factors for which (multi-subject type of name entity) for further study, We use the advantages of support vector machine model in binary classification to recognize the muti-type person names and place names at first; Faced with this situation that Abbreviations is to be difficult to identify. We analysis the factors affecting the organization name.We give a method that combine a boot table consisting of context and formationfull name of the matching to identify Abbreviations based on the feature of the morphological characteristics of the context of the full names is same as Abbreviations.Finally, I make some related experiments that location name and orgniztion name and mutil-type person and location name and Abbreviations. First,we make experiments comparison based on CRF model from a different corpus size, templates, part of speech feature and so on and give a comparison between the new method and existing methods. Then,we describe the optimization of training and rules of the experimental results to check the validity of optimization methods;finally,we describe and analysis the results that multi-type person name and location name recognition and Abbreviations recognition. Experiments show that our methods are effecive for chinese information processing and have some certain significance.
Keywords/Search Tags:Chinese, named entity recognition, multi-subject types, Abbreviations recognition, conditional random fields, support vector machines
PDF Full Text Request
Related items