Font Size: a A A

Research Into Chinese Names Recognition Based On Learning By Analogy

Posted on:2008-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhuFull Text:PDF
GTID:2178360242969502Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Under the background of automatic processing magnanimous information using computer technology, the technology such as information retrieval, information extraction, machine translation, abstracts production arose at the historic moment. Named entity recognition is the premise work of text automatic processing. Its quality influences a series of following work directly. Even though technology of named entity recognition has reached a high level, there is a long way to go for the use of Chinese named entity recognition according to the result of evaluating, for the reason that there are some problems in organic combination of technology, resource and application requirements.Chinese personal names as one of named entities have the characteristics of openness and expansibility. And their constitution form is varied extremely. So the recognition of Chinese personal names is a difficult task. Chinese named entity recognition including Chinese personal names recognition has already become the main bottleneck of using of lexical analysis. Chinese personal names recognition, as one of the tasks on named entity recognition, is an important and difficult problem in Chinese natural language processing research.We propose a method of Chinese personal names recognition based on learning by analogy, taking personal names as the object of study. In this paper, we select Chinese personal names example-vectors to descript language phenomena, and the result is satisfying. The main works of this paper includes four parts:1. We statistic the corpora and analyze the inner-feature and context-feature of personal name. The results are the basic in linguistics. At the same time, we create resources base of personal name based on the corpora, including family-name-word base, given-name-word base, translated-name-words base, feature-words base and so on. In addition, semantics expansion of feature words is done using HOWNET.2. We create personal names examples base. In the processing of creation example-vectors, we consider not only the inner-feature of personal names, but also context-feature of personal name. Namely, example-vectors include inner-feature vector and context-feature vector, and the feature information can be used fully.3. According to recognition strategy based on learning by analogy, we design and develop a test system of Chinese personal names recognition. Through computing similarity of example-vectors, we select the one which similarity is most and recognize personal names according to matching.4. We propose an improved method of example-vectors similarity computing. According to the different stages of personal names recognition, we propose a two-level method of example-vectors similarity computing. In the stage of creation of examples base, example-vectors similarity weighs by common words; in the stage of recognition, example-vectors similarity weighs by common words and feature information.We conducted test on 0.5M-word corpus that was chosen from People Daily. The experimental results showed that the recall score is 90.86%, and precision score is 86.45%. It shows that learning-by-analogy-based Chinese personal names recognition is effective and feasible.
Keywords/Search Tags:Natural Language Processing, Personal Names Recognition, Learning by Analogy, Similarity Computing
PDF Full Text Request
Related items