| Neural machine translation(NMT)systems are usually trained on a large amount of bilingual sentence pairs and translate a text in a conventional sentence-by-sentence fashion.However,sentences in a well-formed text are connected to each other via various links to form the cohesive structure of the text.Ignoring such cross-sentence links and dependencie makes the translation of a sentence ambiguous or even inconsistent with the translations of neighboring sentences and generates an incoherent target text for a coherent source text.Very recent studies find that translation quality drops significantly when NMT translates long sentences.From a certain point of view,a complex long sentence is equivalent to a short discourse,so the performance of NMT drops significantly when dealing with such a long sentence.In view of the above-mentioned defects in NMT,this paper develops a variety of document-level neural machine translation models.The main work of this article includes:(1)A complex long sentence is sometimes equivalent to a short discourse.It is very difficult for the NMT system to translate an entire discourse at a time.Therefore,the translation quality of NMT is significantly reduced when translating long sentences.In this paper,we propose a novel method to deal with this issue by segmenting long sentences into several clauses.We introduce a split and reordering model to collectively detect the optimal sequence of segmentation points for a long source sentence.Each segmented clause is translated by the NMT system independently into a target clause.The translated target clauses are then concatenated without reordering to form the final translation for the long sentence.On NIST Chinese-English translation tasks,our segmentation method achieves a substantial improvement over the NMT baseline on translating long sentences.(2)NMT systems translate one sentence at a time,ignoring inter-sentenceinformation.We expect to use the information of adjacent sentences in the same document to help NMT translate the current sentence.We therefore propose an Inter-Sentence Gate model which uses the same encoder to encode two adjacent sentences and controls the amount of information flowing from the preceding sentence to the translation of the current sentence with an inter-sentence gate.In this way,our proposed model can capture the connection between sentences and use the captured information to help document-level neural machine translation.On several NIST Chinese-English translation tasks,our experiments demonstrate that the proposed Inter-Sentence Gate model achieves substantial improvements over the baseline.(3)NMT systems translate a text in a conventional sentence-by-sentence fashion,ignoring the contextual information provided by the texts,such as topic information,cross-sentence links and dependencies.In order to handle this issue,we propose a cache-based approach to document-level neural machine translation by capturing contextual information either from recently translated sentences or the entire document.Particularly,we explore two types of caches: a dynamic cache,which stores words from the best translation hypotheses of preceding sentences,and a topic cache,which maintains a set of target-side topical words that are semantically related to the document to be translated.On this basis,we build a new layer to score target words in these two caches with a cache-based neural model.Here the estimated probabilities from the cache-based neural model are combined with NMT probabilities into the final word prediction probabilities via a gating mechanism.Finally,the proposed cache-based neural model is trained jointly with a state-of-the-art neural machine translation system in an end-to-end manner.On the NIST Chinese-English translation tasks,our experiments demonstrate that the proposed cache-based model achieves substantial improvements over the baselines. |