Font Size: a A A

Research On Automatic Dependency Parsing For Contemporary Mongolian Language

Posted on:2012-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L G L SiFull Text:PDF
GTID:1115330335973030Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
In this paper, on the basis of the research of traditional Mongolian grammar and successful experiences in other language syntactic parsing, we take full advantage of existing parsing methods and results of Mongolian information processing, designed and implemented a Mongolian automatic syntactic parsing system based on dependency grammar. The work in this paper falls into five parts that includes:1) After an in-depth study on relations between words in Mongolian text, and refer to the successful experiences of other language syntactic annotation scheme, we developed a Mongolian syntactic annotation scheme based on dependency grammar.2) A rule-based dependency parser for Mongolian language is developed. In the rule description we have proposed a multi-tag node description model. In the rule-based parser, all static information is come from machine dictionaries, in order to improve the query speed of dictionary we proposed a data organization model for Mongolian language based on finite state automata.3) Using the annotation scheme for Mongolian dependency relations and rule-based parser, and through an automatic analyzing and manual proofreading strategy we have established a Mongolian dependency tree-bank, which includes about 50 million words.4) Use the tree-bank as training corpora, another dependency parser for Mongolian language based on statistical methods have been developed. And then, we integrated the two type of parser developed a hybrid strategy-based parser. In the Mongolian dependency parsing system, the parser based on statistical method has used the lexical dependency probability models. In order to improve the speed of parsing; we have pre-established a statistical information base, and the base uses the same data organization model as machine dictionaries. 5) A management software for Mongolian dependency tree-bank has been designed and implemented. This software has functions of display, edit, search and count for tree-banks.Finally, through the help of the dependency tree-bank and evaluating tools, we have evaluated the parser based on rules, based on statistical method and based on hybrid strategy. Experimental results show that the parser based on hybrid strategy has a good performance, its unlabeled annotation score, labeled annotation score and head-word annotation score has respectively reached 77.18%, 69.90% and 95.44%.
Keywords/Search Tags:Contemporary Mongolian Language, Dependency Grammar, Tree-bank, Syntactic Parsing, Syntactic Annotation
PDF Full Text Request
Related items