Research On Multilingual And Cross-domain Neural Machine Translation Technology Based On Transformer

Posted on:2024-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chen

Full Text:PDF

GTID:2568307073468264

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Neural machine translation is a technique that uses artificial neural networks to translate one language(source language)into another(target language).Among the various machine translation methods,neural machine translation has achieved high quality translation in general domains with large amounts of parallel corpus data due to the use of neural network technology.However,there are still problems in adaptive neural machine translation for multilingual and domain:(1)Scarcity of corpus in minority languages and specialized domains leads to translation models that cannot effectively learn word vector representation with low resources,and there are problems of mistranslation and omission.(2)In multilingual neural machine translation,how to migrate other high-resource languages to enhance the semantic learning of low-resource languages.(3)In domain adaptive neural machine translation,there are problems of domain knowledge overfitting,a single model can only correspond to one domain,and the need for large-scale human adjustment of parameters during training.In response to the above questions,the study focuses on:(1)To address the problem of scarce corpus in small languages and specialized fields,the Scrapy crawler system was used to collect more than 1 million pieces of patent text information,and through data cleaning,chapter cutting,domain filtering and machine translation methods,more than 100,000 pieces of parallel corpora in six languages,such as English-Japanese and English-Spanish,in the field of information technology were constructed.The constructed parallel corpora were also evaluated using indicators based on utterance length,translation quality of real words,and translation quality of phrases.The top25% and the bottom 25% of the evaluated corpus were taken for translation model training.The results show that the BLEU values of the models trained on the first 25% of the corpus are all higher than those of the models trained on the second 25% of the corpus,with the English-French model having the highest BLEU value of 1.18.(2)A neural machine translation method based on semantic space sharing and self-back translation is proposed to address the problem of how to transfer knowledge of other highresource languages to enhance semantic learning of low-resource languages in multilingual translation.The method uses semantic space sharing to share the lexical representations of multiple languages into a common language space into a common word representation.And the self-back translation strategy is integrated into the semantic space sharing model to backtranslate the predicted sentences acquired in forward translation to fit the source sentences at each step of the training process to acquire more contextual knowledge in a limited corpus situation.Several experiments were conducted on four low-resource language datasets from Romanian(Ro),Azerbaijani(Aze),Belarusian(Bel),Galician(Glg)to English.The experimental results show that the BLEU values improve by 4.3 for Romanian(Ro)and 5.1 for Galician(Glg)compared to the baseline model,indicating that the proposed method achieves significant improvement in translation quality in multilingual low-resource situations.(3)To address the problems of knowledge overfitting,poor model flexibility,and dominance of human experience in domain adaptive neural machine translation,this paper proposes a multi-domain adaptive approach(KAIP)based on knowledge augmentation and incremental pruning.The method uses a knowledge-hiding strategy to use an auxiliary corpus for auxiliary task learning during training,feedforward augmentation of the knowledge passed from the encoder to the decoder,and then uses a model pruning strategy to learn multiple disconnected domain-specific sub-networks to adapt to multiple different domains without adjusting the model.The single and multiple domain adaptation tasks on four target domain datasets and five extended domain datasets show significant improvements in BLEU values for each domain,with a 2.3 improvement in BLEU values on the Novel domain,a 1.1 improvement on the EMEA domain,and a 1.4 improvement on the IT domain.It is verified that the proposed method in this paper can effectively cope with domain adaptive tasks.

Keywords/Search Tags:

Neural machine translation, Semantic space sharing, Corpus evaluation, Domain adaptive translation

PDF Full Text Request

Related items

1	Research And Implementation On Uyghur-Chinese Neural Machine Translation
2	Study On Technology Of Corpus Processing And Its Quality Evaluation For Statistical Machine Translation
3	Exploring Method Of The Construction Of Parallel Corpus For Machine Translation In A Specific Domain
4	Research On Chinese-English Neural Machine Translation Based On Joint Learning
5	Research On Automatic Evaluation Of Machine Translation Based On Linguistic Knowledge
6	Research On Bilingual Corpus-Based Machine Translation
7	Research On Specific-domain Monolingual Paraphrase Extraction In Automatic Evaluation Of Machine Translation
8	Research On Chinese-Vietnamese Neural Machine Translation Method Based On Comparable Corpus
9	Parallel Corpus Generation And Filtering Method For Chinese-Thai Neural Machine Translatio
10	Research On Semantics Analysis-based Domain Adaptation Reinforcement Method For Machine Translation