Font Size: a A A

Research On Chinese-English-Myanmar Neural Machine Translation Based On Multilingual Joint Learning

Posted on:2021-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2515306200453334Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Machine translation has achieved good translation results,and with the development of neural machine translation,the quality of translation has been further improved,and it is more convenient for people of all countries to communicate.However,a large amount of parallel corpus is required for machine translation.Burmese language is a resource-scarce language.There are few Chinese-Burmese parallel corpora published on the Internet,resulting in the slow development of translation between Chinese and Burmese.In order to solve the problem of poor translation quality caused by the lack of parallel corpus,the relevant personnel studied neural machine translation of multi-language joint learning,using resource-rich languages to improve the translation effect of resource-scarce languages.Multilingual joint learning has also become a current research hotspot.Based on this,this paper uses Chinese-English resource-rich language to improve the translation of Chinese-Burmese corpus-scarce language,and studies the Chinese-English-Burmese multilingual translation method based on shared decoders and multilingual joint training.The research of this thesis has practical application value.The thesis mainly achieved the following research results:(1)Construction of parallel corpus of Chinese-Myanmar,Chinese-English and English-MyanmarChinese-Myanmar parallel corpus resources are scarce,and there is no public and authoritative Chinese-Myanmar parallel corpus at domestic and abroad.Therefore,this thesis mainly introduces the acquisition method of Chinese-Myanmar,Chinese-English,English-Myanmar parallel corpus.We introduced how to crawl the corpus using crawler technology,as well as the problems and solutions encountered in the process of crawling the corpus.This thesis also uses the LDA topic model and bilingual word vectors to construct comparable documents from the collected Chinese and Burmese chapter documents,and extracts parallel corpora from the comparable documents.Finally,the number of parallel corpora constructed is summarized.The Chinese-English parallel corpus is more than 2 million sentence pairs,the English-Myanmar parallel corpus is nearly 200,000 sentence pairs,and the Chinese-Myanmar parallel corpus is more than 100,000 sentence pairs.(2)Chinese-English-Burmese multilingual translation method based on shared decoderIn view of the lack of parallel Chinese-Burmese corpus,we propose a multi language translation method based on shared decoder.We use English and Burmese as the source language and Chinese as the target language.We use BERT model to train the word embedding,and train the bilingual word embedding of English-Burmese as the glossary of translation model,then use different encoders to encode the English and Burmese,share the same decoder on the decoding,and then decode to get the translation from Burmese to Chinese.The encoder and decoder both use long short-term memory networks.(3)Chinese-English-Burmese neural machine translation method based on multilingual joint trainingBecause the three languages of Chinese,English and Burmese are quite different,we propose a Chinese-English-Burmese neural machine translation method based on multilingual joint training,and implement multilingual to multilingual translation.We use the Transfomer network for Chinese-English-Burmese multilingual translation,and map the Chinese,English,and Burmese word vectors to the same semantic space in the model,which can reduce the differences between the three languages.Since the source language and the target language have three languages,we discuss making all source and target languages share the same encoder/decoder.Finally,to improve the translation effect of Burmese language.(4)A prototype system of Chinese-English-Myanmar neural machine translation for multilingual joint learningBased on Web technology,a prototype system of Chinese-English-Myanmar neural machine translation for multilingual joint learning is developed,which realizes the visual display of Chinese-Myanmar and English-Myanmar translation,and provides technical support for the development of multilingual joint learning to improve the translation of low resource languages in Southeast Asia.The module of the system includes sentence input module,Participle module,translation module and Output module.
Keywords/Search Tags:Neural machine translation, Parallel Corpus, Chinese-English-Burmese Multilingual, Shared Decoder, Shared Semantic Space
PDF Full Text Request
Related items