Font Size: a A A

Statistical Machine Translation Research And Applications

Posted on:2017-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LuFull Text:PDF
GTID:2348330488463801Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the information age with rapid development of Internet, communication between countries in different regions become more easily and frequently, the language as a carrier of information, the translation between different languages become more important. Great translation needs to promote the rapid development of machine translation. In many translation models, statistical machine translation model is the most widely used and has a high performance, in recent years gradually became core content in the field of machine translation.The main content of this paper is the design and implementation of the phrase based statistical machine translation system, machine translation will be divided into four independent functional modules that data preprocessing, translation model training, language model training and decoder. Data pre-processing includes word segmentation and format standardization; Translation model training mainly includes word alignment, phrase extraction and phrase scoring; Language model using the most commonly used N-Gram language model; The decoder is the core module of the translation system, which includes the acquisition of translation candidate items, the computation of the future translation probability, searching the optimal path and generate translations. Using a stack-based search algorithm, in the decoding process, in order to improve the efficiency of translation, using the different pruning strategies to optimize the decoder and experimental effects of different pruning strategy on translation.This paper implements a complete statistical machine translation system can completes the basic requirements of translation. Verify cube pruning, histogram pruning and dynamic pruning strategies, through experimental data analysis pruning strategy performance and scope.This study concluded that stack-based decoding algorithm with high performance, but the decoding speed is slow, it caused by the large number of redundant translation candidate, choose the suitable pruning strategy can effectively improve the efficiency of translation.This study confirmed that cube pruning has highest comprehensive performance and relatively safe and stable; Dynamic pruning performance depends on the scale of the object and its performance increases with the size of the pruning objects, have excellent performance in large scale translation options in pruning.
Keywords/Search Tags:statistical machine translation, phrase extraction, stack based decoding algorithm, histogram pruning, cube pruning
PDF Full Text Request
Related items