Topically-Informed Bilingually-Constrained Recursive Autoencoders For Statistical Machine Translation

Posted on:2019-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Ruan

Full Text:PDF

GTID:2415330548486867

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Learning high-quality phrase vector representations is one of important research topics in statistical machine translation(SMT).Inspired by the success of monolingual phrase em-beddings,many approaches have been presented to implement bilingual phrase embedding for statistical machine translation(SMT).The intuition behind them is that a phrase and its correct translation should share the same semantic meaning,and thus,they should be embed-ded closely to each other in the shared embedding space.However,only semantic composi-tions of internal words within phrases are considered in the existing work,while the semantic information beyond phrases are ignored,which limits the potential of the learned phrase vec-tor representations.In this paper,we propose Topically-informed Bilingually-constrained Recursive Auto-encoders(TBRAE),which addresses the aforementioned issue in learning contextual bilingual phrase embeddings.Incorporating contextual clues into phrase vec-tor representations,our model substantially extends the Bilingually-constrained Recursive Auto-encoders(BRAE)by exploiting latent topics in two ways.Our first inspiration comes from the observation that the meanings of phrases are often context-dependent.Hence,we represent the document-level context of each phrase with its document-topic distribution,which can be incorporated with the RAE to produce the topical phrase embedding.Our sec-ond inspiration derives from the observation that the word topic assignments outputted by the latent topic model reflect the semantic correlations between words and topics,which can be used to constrain the learning of their embeddings.To this end,we design word-topic semantic constraints to encourage words with similar topic assignments to be placed closely in the embedding space.Comparing with BRAE,the TBRAE model not only considers the document-level context beyond the phrases,but also directly models the interactions between word and topic embeddings,both of which are the bases of topical phrase embeddings.Ex-periment results on Chinese-English translation show that the proposed model significantly improves the translation quality on NIST test sets.

Keywords/Search Tags:

Topical Phrase Embeddings, Bilingually-Constrained Recursive Autoen-coders, Statistical Machine Translation

PDF Full Text Request

Related items

1	Research On Optimization Technologies For Decoding In Phrase-Based Statistical Machine Translation
2	Research On The Key Technologies For Phrase-based Tibetan-english Statistical Machine Translation
3	Research On The Application Of Constrained Optimization In Neural Machine Translation
4	Chinese Machine Translation In The Prepositional Phrase Disambiguation
5	Research On Domain Adaptation For Statistical Machine Translation Based On Topic And Semantic Analysis
6	Research On Optimization Of Language Model Based On Statistical Machine Translation
7	A Study On The Translation Methods Of Chinese - Vietnamese Phrases Based On Linguistic Features
8	A Report On The Translation Of An Excerpt From Machine Translation
9	Constructing An Integrated System： Rule-based And Statistics-based Machine Translation
10	On Translation Unit:Machine Translation Approach