Font Size: a A A

Research On Entity Recognition Of Person Names In Uyghur Text Corpus

Posted on:2017-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:S F L T N Z M D TaFull Text:PDF
GTID:2348330503484331Subject:Engineering, information and communication engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition refers to the recognition of named entites in the text.Named entity recognition as a basic task of Natural Language Processing, is widely and successfully used in information extraction, information retrieval, information recommendation and Machine Translation and other tasks. As the person name is the main body of the event, the person name recognition is an important sub task of the named entity recognition task.Named entity recognition is an important subtask of Natural Language Processing,where person name is one of the major objects. From agglutinative characteristics of the Uyghur language, we split the Uygur word into different level units such as syllable, suffix and stem etc., which significantly reduce data sparse problem. Since the Han people name is the major remaining errors for the CRF-based approach, we also propose a rule-based post-processing approach for Han people name recognition in Uyghur language. Experimental results show that this cascade approach achieves satisfactory performance, and the recognition accuracy, recall rate and F1 score are87.47%, 89.12% and 88.29% respectively.In addition, we use a variety of statistical models to do a comparative experiment to try out the three named entities, the person name, the location name, the organization name based on statistical recognition method. By comparing the experimental results, the conditional random field model is proved to be the best statistical model for Uyghur named entity recognition. At the same time, we also use the names, place names and features of library organization name thesaurus feature words external dictionary to improve the recognition effect.The results of this paper can also be used in the identification of other Uyghur named entities and related text categorization tasks.
Keywords/Search Tags:Uyghur Language Processing, Person Name Recognition, Conditional Random Field, Syllable Bank
PDF Full Text Request
Related items