Font Size: a A A

Deep Learning-based Machine Translation Research In China And Malaysia

Posted on:2022-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:S Y WeiFull Text:PDF
GTID:2518306764983819Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,China and Malaysia have been exchanging very close economic,political and cultural relations bilaterally,and the demand for translation between the two languages has become increasingly large.Human translation for Malaysian is costly and inefficient.In this trend,machine translation has great research significance and practical value.Currently,the excellent performance of machine translation models depends on the quantity and quality of corpus data,and the insufficient Chinese and Malaysian corpus data makes it difficult to improve the performance of Chinese and Malaysian machine translation models.In this paper,we propose a Chinese and Malaysian machine translation model based on migration learning and Gumbel-Tree-LSTM optimization,which can effectively alleviate the problem of insufficient corpus data in Chinese and Malaysian machine translation tasks,and further integrate the above models to realize a Chinese and Malaysian machine translation prototype system.The main research work of this paper is as follows:(1)Collection and processing of Chinese and Malaysian parallel corpus data.A Python-based web crawler algorithm is designed for the Sino-Malaysian corpus.Through the steps of corpus collection,keyword extraction and data pre-processing,a Sino-Malaysian parallel corpus dataset is designed and constructed.(2)Based on the idea of migrating pivotal languages,a migration learning-based approach to the Chinese-Malaysian machine translation model is proposed.Based on training Chinese-English and English-Malaysian NMT models with richer corpus,the translation performance of the low-resource Chinese-Malaysian machine translation model is improved by migrating the parameters of the shared pivot language model and initializing the parameters of the Chinese-Malaysian NMT model.Using English as the pivot language helps to reduce the impact caused by language differences when migrating the model parameters.The experimental results show that the migration learning-based approach helps to improve the model translation performance.(3)A Chinese and Malaysian machine translation model optimized based on the Gumbel-Tree-LSTM model is implemented.The Bi-LSTM is optimized using Gumbel-Tree-LSTM to enhance the model's ability to understand the semantic relationships of medium and long distance words by learning the tree structure of source sentences and using tree vectors as contextual information.The experimental results show that this paper's Gumbel-Tree-LSTM based model optimization method helps to improve the model translation effect.(4)A prototype Chinese-Malaysian machine translation system incorporating migration learning and Gumbel-Tree-LSTM is implemented.By combining the advantages of migration learning and Gumbel-Tree-LSTM and performing joint training,a prototype Sino-Malaysian machine translation system with excellent performance is obtained.The system can be used as a testbed to further investigate how more languages can be fused by the Sino-Malaysian neural machine translation framework.
Keywords/Search Tags:neural machine translation, chinese-malay, transfer learning, Gumbel-Tree-LSTM
PDF Full Text Request
Related items