Font Size: a A A

Research On Incorporating Glyph Features Of Chinese Characters Into Neural Machine Translation

Posted on:2020-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z L CaiFull Text:PDF
GTID:2415330578980892Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The model of neural machine translation(NMT)has many advantages such as simplicity,versatility and outstanding translation performance.Since it was first proposed in 2014,it has rapidly become one of the research hotspots in the field of machine translation.The original neural machine translation system simply models plain text in an end-to-end manner and ignores linguistic knowledge.For this reason,many scholars have tried to incorporate some linguistic features such as part of speech and dependency syntax into NMT system,which have achieved good results.These works prove that although NMT system has a strong ability to learn plain text,incorporating additional lignuistic features can further improve it.Chinese characters are the only remains of language with thousands of years.Ancient Chinese characters,such as oracle bone inscriptions and Zhongding inscriptions,are object-based and meaning-based fonts.In this way,even those who do not know Chinese characters and Chinese culture can probably understand the meanings of Chinese words when they see the picture-like Chinese characters.Although today's Chinese characters are no longer"general in form",they still retain the original pronunciations and meanings,and in the long process of application,its shapes,sounds and meanings have been integrated and formed into a three-element system.Since Chinese character fonts contain rich linguistic features such as pronunciation and semantics,incorporating Chinese glyphs into translation model may be an effective way to improve NMT translation performance.In this paper,we develop a neural machine translation model based on glyph features of Chinese characters.The main work of this paper mainly includes the following three aspects:(1)Incorporating glyph features of Chinese characters into character-level neural machine translation.Each Chinese character has its corresponding font shape,which can be expressed digitally by font vector.In this work,we use splicing method to get the corresponding glyph features from the dot image of Chinese characters,and design two different glyph features fusion methods,they are partial substitution method and assistant learning method respectively.We conducted experiments on task of Chinese-English translation.The results show that incorporating glyph features of Chinese characters into Character-level neural machine translation by assistant learning method can greatly improve the quality of the translation.(2)Incorporating glyph features of Chinese characters into word-level neural machine translation.Words usually consist of unequal numbers of characters,so it is impossible to obtain the corresponding font vectors directly like characters.In this work,we use the LSTM to learn the glyph vector representation of words.In addition,we also use the spatial translation invariance caused by the filter structure in convolutional neural network to learn the vector representation of glyph.Compared with the benchmark system,these two methods and the assisted learning method mentioned in the previous work have greatly improved the translation performance of model.(3)Incorporating glyph features of Chinese characters into neural machine translation based on character-word hybrid model.In this work,we improved the generative model of word embedding which was proposed by Luong et al,and proposed the decomposition model of word embedding based on the language attributes of Chinese.In addition,we use the same method which is applied to the first two jobs to incorporate the glyph feature of Chinese character into the translation model.The experimental results show that the glyph features of Chinese character can promote the neural machine translation based on character-word hybrid model.
Keywords/Search Tags:Neural Machine Translation, Glyph Features of Chinese Characters, Generation Model of Word Embedding, Decomposition Model of Unregistered Word
PDF Full Text Request
Related items