Font Size: a A A

Research Of Statistical Translation Model Based On Distributed Compositional Semantics

Posted on:2017-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:C C WangFull Text:PDF
GTID:2308330488461926Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important branch of artificial intelligence, statistical machine translation(SMT) has witnessed three different research stages ranging from word-based to phrase-based and syntax-based SMT. Among the process, the quality of translation has been continuously improved, but there are still many problems such as the language fails to express the meaning. In view of the above bottleneck, many researchers from Computational Linguistics have widely focused on semantic-based SMT. In this paper, we carried out following several researches:(1) Statistical translation model based on distributed semantic representation of wordsFirst, we adopt word vector space and word embedding to obtain distributed semantic information of the source word and target word in the large-scale parallel bilingual corpus. Secondly, source word semantic information can be mapped into target language semantic space by a non-linear projection model. Then we calculate the bilingual semantic similarity in the same semantic space. Finally, we embed the similarity into the hierarchical phrase translation system as a new feature. Experimental results show that the semantic information can improve the performance of SMT.(2) Statistical translation model based on compositional semantics using vector mixtureFirst, we obtain the semantic information of the source phrase and target phrase using vector mixture models based on distributed semantic representation of words. Secondly, source phrase information is mapped into the sematic space of the target language by non-linear mapping method. Then we calculate the bilingual translation similarity using translation similarity model. Experimental results show that, the distributed compositional semantics is superior to the lexical semantics on the performance of SMT.(3) Statistical translation model based on compositional semantics using recursiveautoencoderOn the basis of the previous research, we used recursive autoencoder, a method of neural networks, to obtain compositional semantics both in source and target language. Then we integrate the bilingual compositional semantics into the decoding process of SMT. Specifically, we use recursive autoencoder to obtain bilingual compositional semantics in source and target language respectively. Finally, we compared the translation results by feature obtained using linear projection and non-linear projection.
Keywords/Search Tags:Statistical Machine Translation, Distributed Semantics, Compositional Semantics, Neural Networks, Translation Similarity
PDF Full Text Request
Related items