Font Size: a A A

Research On Chinese-Vietnamese Machine Translation Methods Integrating Syntactic Knowledge

Posted on:2020-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Y L HeFull Text:PDF
GTID:2515305975457434Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Vietnam is an important neighbor of China,and has close exchanges and cooperation with China in politics,economy and culture.And under the "Belt and Road" policy,the exchanges and cooperation between China and Vietnam are closer,and the need for machine translation technology is more and more urgent.However,the current level of machine translation technology can not be satisfied with the rapid economic and cultural development,the main reason is that the scarcity of bilingual parallel corpus of Chinese and Vietnamese restricts the development of Chinese-Vietnamese machine translation.In recent years,machine translation based on recurrent neural networks has been developed rapidly,but the improvement of translation performance depends on the large-scale bilingual parallel corpus.Therefore,the use of technical methods to improve translation performance has become a current research hotspot.In view of the linguistic characteristics of Chinese and Vietnamese,and the problem that the word order and syntactic structure of Chinese-Vietnamese machine translation do not conform to the grammatical rules of the target language,this paper studies how to integrate the part-of-speech information and syntactic parsing tree into the convolution neural machine translation model.The main work of this paper are as follows:(1)The Method on Chinese-Vietnamese Convolutional Neural Machine Translation with incorporating Part-of-Speech InformationPart-of-speech information has a binding effect on the generation of word order in machine translation.Aiming at the great difference between Chinese and Vietnamese in lexical modifications and the inconsistency of word order in Chinese-Vietnamese machine translation,this paper proposes to integrate part-of-speech information into machine translation model to regulate the word order of the translation.Therefore,the method on Chinese-Vietnamese convolutional neural machine translation with incorporating Part-of-Speech Information is proposed.Firstly,bilingual alignment corpus with part-of-speech markers is used,the bilingual vocabulary with parts of speech is generated,and then a Chinese-Vietnamese neural machine translation model is trained by using a multi-layer convolution neural network and convolution kernels with different sizes,and a Chinese-Vietnamese bilingual corpus is encoded and decoded by using the vocabulary with parts of speech,and the part-of-speech information is fused into the translation model.(2)The Method on Chinese-Vietnamese Convolutional Neural Machine Translation with incorporating Syntactic Parsing TreeSyntactic analysis is used to determine whether the word sequence of the input sentence conforms to the prescribed grammatical rules.Then the syntactic parsing tree is constructed to determine the hierarchy of sentences and the relationship between syntactic components in each hierarchy,that is,the syntactic information such as which words constitute phrases in sentences and which words are subject or object of verbs can be obtained from the syntactic parsing tree.Therefore,after incorporating the part-of-speech information,this paper further studies the method on Chinese-Vietnamese convolutional neural machine translation with incorporating syntactic parsing tree,making full use of the syntactic structure information in the syntactic parsing tree,to help the convolution neural machine translation model to obtain syntactic knowledge better,in order to constrain the generation of the syntactic structure of the translation.This method firstly parses Chinese and Vietnamese syntax by using Stanford’s Chinese syntactic parser and Vietnamese syntactic parser respectively,Chinese-Vietnamese syntactic parse tree are obtained respectively,then the Chinese-Vietnamese syntactic parsing tree is traversed by depth-first traversal,to obtain the syntactic tag sequence corresponding to each leaf node in Chinese and Vietnamese.Then the syntactic tag sequences are classified according to the syntactic structure of Chinese and Vietnamese.Finally,the vectorized syntactic tag sequences are integrated into the translation model through the gated linear unit(GLU),and the Chinese-Vietnamese neural machine translation model is trained by using a multi-layer convolution neural network,convolution kernels with different sizes and vocabularies with part-of-speech information.(3)The System on Chinese-Vietnamese Machine Translation with incorporating Syntactic KnowledgeIn this paper,the system on Chinese-Vietnamese machine translation with incorporating syntactic knowledge is designed and implemented by combining the convolution neural machine translation method incorporating part-of-speech information and syntactic parsing tree.The system mainly includes front-end service module and machine translation module of WEB.Among them,machine translation module mainly includes input and output function module,text pre-processing function module,translation function module and text post-processing function module.The translation module is based on the convolution neural machine translation framework proposed by Facebook,which is modified to incorporate the part-of-speech information and syntactic parsing tree of Chinese and Vietnamese into the convolution neural machine translation framework.The system provides an important support for the research of syntactic knowledge fusion between Chinese and Vietnamese and convolution neural machine translation.
Keywords/Search Tags:Chinese-Vietnamese machine translation, convolution neural network, syntactic knowledge, part-of-speech information, syntactic parsing tree
PDF Full Text Request
Related items