Font Size: a A A

Research On The Technology Of Building Kazakh Treebank Based On Cascade Conditional Random Fields

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z J YuFull Text:PDF
GTID:2308330476450403Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Kazakh Treebank for the Kazakh automatic syntax analysis, syntax of the hot research fields such as machine translation, text mining, tacit knowledge, its importance is self-evident, especially the Kazakh Treebank technology is still in its infancy compared with Chinese, English and other language. So how to save the premise of human and material resources, can better and faster to build the language Treebank is an important problem needed to resolve. And Treebank construction premise condition is to the Kazakh syntactic analysis, but the method of Kazakh statement analysis technology cannot satisfy the requirement of the Chinese this information processing and so on the Kazakh syntactic analysis research is particularly important. This paper will be a cascading conditional random field model based on the statistics of syntactic analysis methods used in Kazakh statements marked explore and research the problem, and finally formed the complete syntax Treebank. This paper mainly divided into four parts to study work, specific content is as follows:First of all, using the method of Kazakh statements marked tag set norms of Kazakh statement method was chosen. Secondly, establish the platform manual annotation, for corpus annotation and pretreatment, forming model conforms to the corpus of the interface format and respectively as a low-level training corpus and highlevel training corpus. Again, using the cascading conditional random field model based on the Kazakh syntactic tagging research on statistics, and lower in hierarchical model recognition results after the introduction of the transformation based error driven learning algorithm for correction. Finally, through the relevant output the result of the syntactic tagging decoding algorithm, and the special ambiguity are artificial and proofreading, the hierarchical model between the introduction of automatic correction method to alleviate the error spreading problem in syntactic analysis.In this paper, by means of the Xinjiang daily(the Kazakh) corpus of the Kazakh Treebank building related technologies in data sets on the review analysis, to verify the feasibility and effectiveness of the proposed methods. The experimental results show that the Kazakh syntactic tagging precision analysis and the overall treatment efficiency by the language tree based on cascade condition with the airport library building method has been effectively improved. At the same time, to a large extent reduce the input of human and material resources, and later based on the Kazakh syntactic machine translation and text mining laid a certain foundation.
Keywords/Search Tags:the Kazakh, Syntax analysis, Cascading conditions with the airport, Based on the conversion error driven learning algorithm, Treebank
PDF Full Text Request
Related items