Font Size: a A A

A Study On The Translation Methods Of Chinese - Vietnamese Phrases Based On Linguistic Features

Posted on:2017-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:C T LvFull Text:PDF
GTID:2175330488450194Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Vietnam is one of the important neighboring countries in the southwest of China, the research of Chinese-to-Vietnamese statistical machine translation is of great importance in bilingual understanding, public opinion analysis, information retrieval and cultural exchange. At present, the work of Chinese-to-Vietnamese statistical machine translation research is mainly focused on the construction of the bilingual resource database and the study of the word alignment, and translation studies are still in its infancy.Chinese and Vietnamese grammar has distinct difference. The most significant difference is that, the position of modifier (attributive and adverbial) and modified word in Vietnamese are opposite with Chinese, namely attributive is located in the back of the noun and adverb is located behind the adjectives or verbs and opposite in Chinese. The above analysis show the Vietnamese and Chinese have obvious difference in word reorder, and these differences have certain rules:the position of modifier and the modified terms in Vietnamese is opposite with that in Chinese and modifier and the modified terms is consecutive in sentence. Through the analysis and induction of these different points, we can set up some reorder rules and add these rules into phrase-based statistical machine translation, then, explore the influence of characteristics of Chinese and Vietnamese on the performance of statistical machine translation system.(1) The phrase based Chinese-Vietnamese statistical Machine Translation method. Firstly, using the Stanford Chinese word segmentation tool and laboratory developed the Vietnamese word segmentation tools do the word segmentation of Chinese and Vietnamese bilingual parallel sentence segmentation respectively. Using Giza++obtain bilingual word alignment results of the parallel sentence. Then, the probability table of phrase translation is obtained while extracting Chinese-Vietnamese phrase pair and the translation model is trained with the phrase translation probability table. Use the CKY decoder and lexical reordering model while decoding. In the experiment, the translation performance of the phrase based Chinese-Vietnamese machine translation system is tested by using different language models (N-gram). The experimental results show that the translation system has a better performance in the bigram and trigram language models.(2) According to the typical characteristics of Vietnamese modifiers post positioned, a Chinese-Vietnamese statistical machine translation method that fuses of language post positioned characteristic function was proposed. In this method, firstly we analyzed the grammar differences between Chinese and Vietnamese, and extracted the difference of attribute position, adverbial position and qualifier reorder. Secondly we defined reordering block based on those difference, and added mapping method of reordering block to decoding algorithm in phrase based statistical machine translation model, then recorded N-best candidate translations and its score P produced by decoding. Thirdly reordered reordering block with language post positioned characteristic reordering algorithm and estimated score D by unconditional maximum likelihood probability distribution. Finally chose the best translation based on score P and score D. We used lexicalized reordering model based phrase statistical machine translation as contrast experiment. The results of experiment show that our method effectively improves the quality of translation.(3) The language characteristics fused Chinese-Vietnamese phrase-based statistical machine translation system. Use Chinese and Vietnamese language characteristics as the feature added into the process of decoding of phrase based statistical machine translation system. Using some basic open source tools (word segmentation tools, word alignment tool, etc.), in the framework of Java Web, constructed the phrase based statistical machine translation prototype system.
Keywords/Search Tags:statistical machine translation, Chinese-Vietnamese, phrase extraction, language features, lexical reordering model
PDF Full Text Request
Related items