Font Size: a A A

Research On The Methods Of Chinese Noun Compounds Identification And Classification

Posted on:2008-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhuFull Text:PDF
GTID:2155360245496826Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Noun Compounds (NC), as a general grammatical phenomenon in the language, has attracted more and more interest of people in the Natural Language Processing area during the past few years. Its state of art research scope includes boundary identification, syntax analysis, semantic analysis and classification. This thesis contributes in Chinese Noun Compounds problem domain identification, Noun Compounds boundary identification, Noun compounds type identification, Noun Compounds and Named Entity integrated analysis and Noun Compounds'applications.The first part of the thesis describes the research in NC boundary identification by using three methods for the boundary identification as well as analyzing the identification results on the development set, the optimal model for boundary recognition, which is Maximum Entropy model based on the candidate sets, is accomplished. In addition, under the terms of internal knowledge (refers to the internal Chunk attributes) and external knowledge (context where the phrase refers to the environment) where the feature template with 26 Eigen values is abstracted and trained, the F value on test set reaches 89.2%.The second part of the thesis is about the research in NC classification. Based on Chinese NC semantic features and its application in language analysis, a Chinese NC classification system is constructed. It is worth mentioning that, the phase-level Named Entity with NC definition can be completely regarded as NC, thus providing the theoretical foundation for the integrated analysis system in latter chapters. For the reason that phase recognition is based on phase ontology identification, this thesis does the research in two perspectives, one is the common identification for both, the other is the classification based on boundary identification. Results prove the common identification reduces NC identification accurate rate, while classification based on NC boundary keeps high accuracy and enhance the effect.The last part is the research for integration analysis of NC and Named Entities. As the Named Entities have high similarity with NC in compose structure, syntax and semantic features and application area, and phase-level Named Entities act as a sub-set of NCs, therefore, recognition of phase-level Named Entities can depend on the classification of NCs. Moreover, the thesis introduces variety of expanded Name Entities and applys them into running Information Extraction system, which achieves good results.For each subject category, we have dedicated in solving problems via multiple perspectives, multiple models to achieve deeper understanding of the essence of problem, thus optimize the model selection and construct the most suitable NC analysis system platform.
Keywords/Search Tags:Noun Compounds, boundary identification, type identification, Named Entity, Maximum Entropy model
PDF Full Text Request
Related items