Font Size: a A A

Research On The Translation Method Of Hanyue Machine In Metallurgical Field

Posted on:2017-11-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:S X GaoFull Text:PDF
GTID:1315330512462839Subject:Metallurgical Control Engineering
Abstract/Summary:PDF Full Text Request
Machine translation is the most effective way in Cross-language information communications. With the implementation of China's "the Belt and Road" Strategy, Chinese-Vietnamese bilingual machine translation is playing an increasingly important role in promoting bilateral communications and exchanges in many fields such as politics, economy and culture. At present, we have a good cooperative relationship with Vietnam in metallurgy field. As a result, a lot of domain documents, scientific papers, Industry Information in such a field are in great need of being translated, and if we can translate this information automatically by computer, that will be very important to promote cooperation in bilateral metallurgical industries between the two countries. Currently, the study about Chinese-Vietnamese machine translation has not received wide attention, more than that, limited studies have been done in a certain field and that severely restricted the industry-oriented cross-language information exchange. Because of dramatically language differences between Chinese and Vietnamese and unique domain features of domain-specific machine translation, the methods of traditional machine translation cannot be used directly in industry-oriented Chinese-Vietnamese bilingual machine translation. They still face some key technologies to be solved, such as bilingual terminology acquisition, automatic tagging bilingual word alignment, machine translation method with Chinese-Vietnamese language differences and metallurgical domain features. In this paper, by combining with Chinese-Vietnamese grammatical differences and metallurgy domain features, we have studied some key problems in machine translation with metallurgy fields, such as bilingual metallurgy terminology acquisition, Chinese-Vietnamese word alignment, tree-to-tree syntactic statistics machine translation with language feature constraint and syntactic statistics machine translation with metallurgy domain features. The contribution of our work is four-fold.(1)On condition of Chinese-Vietnamese parallel domain corpus scarce, for the difficult bilingual term acquisition issue, we proposed a novel method to automatically acquire bilingual term of metallurgy domain based on pivot language. With lots of Chinese-English and English-Vietnamese parallel texts and scientific literatures on the Internet, conditional random field model are applied to recognize metallurgy terms in Chinese domain texts. A Chinese-English phrase probability table and an English-Vietnamese phrase probability table are built by Phrase-based SMT. According to the idea of pivot, though mapping by English, a Chinese-Vietnamese phrase probability table is generated. After the table being filtered by the Chinese metallurgy terms, we constructed a Chinese-Vietnamese metallurgy domain term base. The experiments showed that the proposed method achieves a good effect, and on the condition of Chinese-Vietnamese bilingual parallel corpus scarce, effectively solves the difficult Chinese-Vietnamese bilingual metallurgy term extraction problem.(2)For the automatically tagging issue of Chinese-Vietnamese word alignment, we proposed a novel method of Chinese-Vietnamese word alignment based on bidirectional RNN & lingual difference features. Analyzing the differences in attributive postposition, adverbial postposition and other structural position between Chinese and Vietnamese, we define some positive switch functions and some structural adjustment functions, and then integrate them into the loss function of the bidirectional recurrent neural network model to be learned. The experiments of Chinese-Vietnamese bilingual word alignment showed that the proposed method defeats the state-of-the-art methods greatly, and that language features and bidirectional context alignment information can effectively enhance the effect of word alignment.(3)Considering the differences between Chinese and Vietnamese, we proposed a method of Chinese-Vietnamese tree-to-tree Statistical Machine Translation with language features. Lingual difference feature plays a good supervised role on machine translation. Analyzing the syntactic differences between Chinese and Vietnamese, we define some rules of language difference, attributive postposition award, time adverbial postposition award and locative adverbial postposition award. On the basis of Chinese-Vietnamese bilingual word-aligned corpus, these awards are combined into extract tree-to-tree translation rules. These defined rules are used to constraint the decoding, to prune and optimize the candidate sentences, and as a result, we acquire the optimal translation sequence. The experiments of Chinese-Vietnamese bilingual sentence translation showed that the proposed method performs well and that syntax difference features can greatly improve the efficiency and accuracy of the translation.(4)In order to improve the translating accuracy of metallurgy domain documents, we proposed a novel Chinese-Vietnamese machine translation method with domain features. Analyzing metallurgy-oriented domain characteristic and its influence on machine translation, on the basis of the domain term base and domain corpus, we construct a bilingual terminology-topic distribution model, a paragraph topic coherence model and a domain knowledge model based on Freebase. In the tree-to-tree translation model with language feature, the bilingual domain terminology, the bilingual term-topic probability distribution, the paragraph domain coherence and the domain knowledge relation are applied to terminology translation, word selection, word combination and pruning optimization for decoding. Thus domain characteristics are used more effectively to enhance the domain translation performance. The compared experiments on metallurgy-oriented Chinese-Vietnamese translation tasks showed that the proposed method dramatically wins over other state-of-the-art methods and that domain topics, paragraph topic coherence and external domain knowledge greatly improve the accuracy and efficiency of domain-oriented text translation.
Keywords/Search Tags:machine translation, Chinese-Vietnamese, metallurgy domain, bilingual term extraction, language difference, word alignment, tree-to-tree SMT, domain-specific syntactic SMT
PDF Full Text Request
Related items