Font Size: a A A

Research On Automatic Recognition For Base Verb Phrases In Mongolian Language

Posted on:2006-05-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:H B Y L DaFull Text:PDF
GTID:1115360155476853Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
Research on automatic recognition for base verb phrase in Mongolian language belongs to basic research of linguistics. It is an important research topic for Mongolian language information processing. This research is made up of two parts: boundary determining and structure analysis. In order to conclude a set of rules for the recognition, training set I (about 40 thousand words) and training set II (about 200 thousand words) have been set up from the one million word degree contemporary Mongolian language corpus. And then base verb phrase set (VPset) concluding 2751 base verb phrases is set up from the result of phrase structure analyses for all sentences in training set I (concluding 1501 sentences). Through the statistical analyses for all of examples in VPset, PT-structure (phrase type structure), SF-structure (syntactic functional structure) and the related morphological features have been concluded, and a preference for the recognition has been put forward with structure feature and statistical information. According to the example analyses for VPset, this paper discusses the subcategorization for components of some base verb phrases and set up a context information model. In this paper, a set of constructing rules of base verb phrase has been concluded under the reason-complete principle and the constraint-appropriate principle, and formalized with MBT (Multiple feature binary tree model) formal model. According to the formal rules, a test for recognizing base verb phrases in training set I and II is done and the recall and precision are provided as a result. Based on the test result, potential ambiguity types of boundary determining and structure analyses are concluded and their disambiguation methods are discussed. In the end, one more test is done in the one million word degree contemporary Mongolian language corpus with the rules optimized by context information. The test result is good at current level of Mongolian language information processing.
Keywords/Search Tags:Mongolian language, base verb phrase, rules for recognition, boundary determining, structure analysis
PDF Full Text Request
Related items