Research On The Methods Of Chinese Noun Compounds Identification And Classification

Posted on:2008-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:H J Zhu

Full Text:PDF

GTID:2155360245496826

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Noun Compounds (NC), as a general grammatical phenomenon in the language, has attracted more and more interest of people in the Natural Language Processing area during the past few years. Its state of art research scope includes boundary identification, syntax analysis, semantic analysis and classification. This thesis contributes in Chinese Noun Compounds problem domain identification, Noun Compounds boundary identification, Noun compounds type identification, Noun Compounds and Named Entity integrated analysis and Noun Compounds'applications.The first part of the thesis describes the research in NC boundary identification by using three methods for the boundary identification as well as analyzing the identification results on the development set, the optimal model for boundary recognition, which is Maximum Entropy model based on the candidate sets, is accomplished. In addition, under the terms of internal knowledge (refers to the internal Chunk attributes) and external knowledge (context where the phrase refers to the environment) where the feature template with 26 Eigen values is abstracted and trained, the F value on test set reaches 89.2%.The second part of the thesis is about the research in NC classification. Based on Chinese NC semantic features and its application in language analysis, a Chinese NC classification system is constructed. It is worth mentioning that, the phase-level Named Entity with NC definition can be completely regarded as NC, thus providing the theoretical foundation for the integrated analysis system in latter chapters. For the reason that phase recognition is based on phase ontology identification, this thesis does the research in two perspectives, one is the common identification for both, the other is the classification based on boundary identification. Results prove the common identification reduces NC identification accurate rate, while classification based on NC boundary keeps high accuracy and enhance the effect.The last part is the research for integration analysis of NC and Named Entities. As the Named Entities have high similarity with NC in compose structure, syntax and semantic features and application area, and phase-level Named Entities act as a sub-set of NCs, therefore, recognition of phase-level Named Entities can depend on the classification of NCs. Moreover, the thesis introduces variety of expanded Name Entities and applys them into running Information Extraction system, which achieves good results.For each subject category, we have dedicated in solving problems via multiple perspectives, multiple models to achieve deeper understanding of the essence of problem, thus optimize the model selection and construct the most suitable NC analysis system platform.

Keywords/Search Tags:

Noun Compounds, boundary identification, type identification, Named Entity, Maximum Entropy model

PDF Full Text Request

Related items

1	Research On The Named Entity Recognition And Base Noun Phrase Identification
2	Research On English Clause Identification For Machine Translation System
3	Containing The Longest Noun Phrase Automatic Identification
4	The Automatic Identification Research Of Preposition "Dao" And Structure For Information Processing
5	The Boundary And Function Identification Research Of The "Suo" Construction In Modern Chinese Based On Modern Chinese Corpus
6	Figurative Identification In Crisis Coverage
7	The Study On The Generation Mechanism Of Mandarin Coordinate Compounds Of Noun-Noun Type
8	A Comparative Analysis Of The Use Of Modals In Identification Strategies From The Perspective Of Identification Theory
9	Research On Chinese-Vietnamese Entity Alignment Technology Based On Named Entity Recognition
10	A Study On The Audience’s Identication In English Translation Of Political Texts From The Perspecitive Of Translational Rhetoric