Font Size: a A A

The Implementation Of Cyrillic Mongolian-Chinese Machine Translation System

Posted on:2016-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:R L G WuFull Text:PDF
GTID:1225330461980876Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
Machine translation for multinational languages means a lot in maintaining social stability, sharing advanced technology, promoting the communication, keeping the national culture, etc. In the study on Cyrillic Mongolian-Chinese machine translation, we meet lots of problems, like:huge gap in language grammar, complicated linguistic, the lack of both resources and previous work. Among them, many are also front edge problems in machine translation, such as, machine translation modeling on languages with complex linguistic and low amount of resources.We build a statistical machine translation system for Cyrillic Mongolian-Chinese translation on open source tools and language resources we collected ourselves. To improve the translation quality, we have the following attempts:1) Formulate a standard on Cyrillic Mongolian-Chinese bilingual corpus, and establish a large-scale corpus with 220,000 sentences.2) Formulate preprocessing steps for Cyrillic Mongolian corpus:unify encoding, abbreviation conversion, case conversion, etc.3) Recognize and translate named entity with:Cyrillic Mongolian-Chinese dictionary on person names and place names, rules of numerals and time words.4) Analysis special grammar phenomenon of Cyrillic Mongolian in detail, and improved the segmentation.In this paper, based on phrases to statistical machine translation based research focuses on solve granularity phase adapt to the words of linear representation of the machine translation modeling, nonlinear words representation of the machine translation modeling, lack of resource-oriented language man-machine combination of machine translation knowledge, Mongolia wen access to information processing technology basis, language repository issues such as construction and eventually to set up a face to the government documents and daily two terms in the field of Cyrillic Mongolia text-the article Chinese machine translation system.Based on bilingual corpus collected, we use according to the development of training set and testing set filtered set as the training set. And use the training set of the target language training to the language model.We on the basis of the study implements the Cyrillic Mongolian-Chinese machine translation system and the evaluation experiments. Establish a test set of 1000 sentences as the object, a copy of the original (Cyrillic Mongolian), four of the target language sentence (in Chinese) as the reference answer. Machine translation project of automatic metering the BLEU-SBP as evaluation indexes. Using this system for the translated result is evaluated, the results show that the scale of bilingual corpus, and language information processing is the key to improving the quality of translation. We on the basis of the study implements the Cyrillic Mongolian-Chinese machine translation system and the evaluation experiments. Establish a test set of 1000 sentences as the object, a copy of the original (Cyrillic Mongolian), four of the target language sentence (in Chinese) as the reference answer. Machine translation project of automatic metering the BLEU-SBP as evaluation indexes. Using this system for the translated result is evaluated, the results show that the scale of bilingual corpus, and language information processing is the key to improving the quality of translation.
Keywords/Search Tags:Mongolian information processing, Machine translation, Language resources construction, Named entity translation
PDF Full Text Request
Related items