Font Size: a A A

Research On Machine Translation Technology For Scarce Languages

Posted on:2022-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhuFull Text:PDF
GTID:2518306350493854Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and economic globalization,as well as the proposal of China’s "Belt and Road Initiative",machine translation technology plays an increasingly important role in promoting political,economic and cultural exchanges.In recent years,with the rise of Deep Learning,the performance of neural machine translation has also been greatly improved.The training of neural machine translation model depends mainly on the data quality and data scale of the corpus.In view of the fact that the training corpus for low-resource languages is very scarce,the machine translation performance is far from satisfaction.Thus how to improve the machine translation performance of low-resource languages is becoming an important research topic.This paper studies neural machine translation methods for low-resource languages from two aspects.On the one hand,the data scale of low-resource languages is too small to train a language model.Taking the experience from similar tasks,transfer learning of machine learning technology is introduced to solve the problems.First,it is verified that transfer learning is effective in Chinese-related machine translation.Second,a parameter transfer method based on hybrid model is proposed.On the other hand,aiming at the problem of small scale of bilingual corpus,this paper attempts to further improve the performance of machine translation by expanding the corpus.Firstly,this paper uses monolingual data to construct pseudo data through self-learning as a way to expand the training data corpus.Secondly,for languages that its corpus can’t be expanded by using monolingual data,a data enhancement technology based on other bilingual data is proposed.The experimental results show that the parameter transfer method based on the hybrid model proposed in this paper can effectively improve the performance of neural machine translation,and has achieved good results in Slovenian-Chinese,Latvian-Chinese,Croatian-Chinese and Lithuanian-Chinese translation models,with an average increase of 5 points in BLEU scores.Using monolingual data enhancement and mixed training with source language real data and target language real data can effectively improve machine translation performance,and the average BLEU score is increased by 3 points in Croatian-Chinese bidirectional machine translation.The method of bilingual data enhancement using Hindi-English bilingual data effectively improves the performance of Hindi-Chinese translation model.To sum up,the method in this paper can effectively improve the machine translation performance of various low-resource languages.
Keywords/Search Tags:Low-resource Languages, Machine Translation, Transfer Learning, Data Augmentation
PDF Full Text Request
Related items