Font Size: a A A

Research On Chinese-Vietnamese Neural Machine Translation Method Based On Domain Knowledge Enhancemen

Posted on:2023-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhangFull Text:PDF
GTID:2555306797973279Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Driven by the rapid development of deep learning technology,the performance of Neural Machine Translation(NMT)has been significantly improved.The translation performance in mainstream languages has gradually approached or reached the level of human translators.Currently,for Chinese-Vietnamese languages,the size of domainspecific corpora and the number of available domains are limited,and bilingual knowledge bases and parallel corpora are lacking.In terms of translation in specific fields,the neural machine translation model has insufficient training,poor generalization ability,and unsatisfactory translation quality.In the case of low resources,improving the performance of neural machine translation in specific fields of Chinese and Vietnamese has a wide range of market application space and academic research value.In this thesis,domain knowledge is integrated into the process of Chinese-Vietnamese neural machine translation,and the following research work has been completed:(1)Parallel sentence pair extraction method based on pre-trained language model and bidirectional interactive attentionScreening high-quality parallel sentence pairs from comparable corpora on the Internet is one of the effective means to improve the performance of low-resource machine translation.However,the comparable corpus in the Internet usually contains a large amount of noisy data.How to filter the noise in the Internet data accurately and effectively is the main challenge of parallel sentence extraction.Aiming at this problem,this thesis proposes a cross-language text semantic matching method based on interactive attention mechanism by integrating pre-trained semantic representation,which utilizes the consistency of bilingual semantic representation to realize parallel sentence pair extraction in noisy environment.The specific idea is: first use the pretrained language model to obtain the bilingual representation of the source language and the target language respectively,then realize the spatial semantic alignment of cross-language features based on the interactive attention mechanism,and finally realize the cross-language sentence based on the semantic representation after fusion of multi-view features.correct relationship determination.The experimental results on the artificially constructed Chinese-Vietnamese corpus and the IWSLT15 EnglishVietnamese corpus show that the proposed method is superior to the existing parallel sentence pair extraction models.In addition,with the help of the extracted parallel corpus,the performance of the machine translation model has been significantly improved.(2)Chinese-Vietnamese neural machine translation method based on domain knowledge enhancementA specific domain contains a large number of domain words,which cannot be translated well with low resources.How to integrate domain knowledge into machine translation models and improve translation performance in specific domains is an urgent problem to be solved.For the above problems,this thesis proposes a ChineseVietnamese neural machine translation method based on domain knowledge enhancement,by identifying the domain words in the source sentence,in the encoding stage.An encoding module of domain knowledge is introduced to learn the vector representation of domain words,and an attention mechanism of domain word-encoder is introduced to enhance the vector representation of source language sentences by using domain words.In the decoding stage,a domain word-decoder attention mechanism is introduced to jointly guide the generation of target translations through domain words and source language.The experimental results based on the constructed data in the field of the new crown epidemic show that the performance of ChineseVietnamese neural machine translation in a specific field is improved.
Keywords/Search Tags:Pretrained language models, Semantic matching, Domain knowledge, Low-resource neural machine translation
PDF Full Text Request
Related items