Research And Its Implementation On Translation Recommendation System Fusing Search Technology

Posted on:2017-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:W Wang

Full Text:PDF

GTID:2348330503492885

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Translation retrieval is regarded as the combination of machine translation and information retrieval technology. Machine translation discusses how to translate a natural language into another natural language by using a computer. Information retrieval returns the documents related to the user’s query. The conventional translation retrieval methods based on parallel corpus generally, and the effect is mainly dependent on the quality of the parallel corpus. We build a Chinese-English translation recommendation system based on monolingual corpus. It casts translation as a retrieval problem, which solves the problem of conventional approaches that mainly rely on parallel corpus which is difficult to collect. It also improves the fluency of final translation references. The system combines query-translation and information retrieval together. Given a set of Chinese query, the query-translation generates N-best target language results and the information retrieval computes the similarity of the query and the document. In this dissertation, we study the methods of Chinese-English translation, which can be applied to other languages. To be more specific, the primary work includes the following aspects:First, a phrase based machine translation system is designed and implemented. Using ICTCLAS2011 to process the original Chinese language materials, and using GIZA++ to complete the word alignment work. The log-linear model is used to train the features, and the minimum error rate function is used to estimate the weights. Then, we accomplished decode and the BLEU automatic evaluation. Experiments show that, the system has a strong competitive performance on 4-grams model. A better BLEU value is achievedSecond, based on the Lucene Apache, the retrieval model algorithm is improved. A benchmark retrieval model which based on vector space model is constructed. Considering the word order consistency of N-best results and translation candidates, we propose an optimal retrieval mode by using Levenshtein-distance for ordering the retrieval results properly.Third, a translation recommendation algorithm with retrieval technology based on monolingual corpus is proposed. The results can be measured based on the coupling of translation and retrieval subsystem, and algorithm returns the ranked translation candidates according to the final score. Experiments show that the maximum of 70.83% f-measure value is achieved.

Keywords/Search Tags:

Machine translation(MT), Information retrieval(IR), Natural language processing(NLP), Levenshtein-distance

PDF Full Text Request

Related items

1	Network Machine Translation, Information Flow Processing
2	Intelligent Machine Translation In The Context Of Information Processing
3	Research On Machine Learning For Natural Language Processing And Transmission
4	Based On The Generalization Of The Instances Of Machine Translation
5	Research On The Construction And Anal Sis Of Common Sense Corpora For Natural Language Generation
6	Research On Multimodal And Semi-parametric Neural Machine Translation Integrating External Information
7	Using Latent Information for Natural Language Processing Tasks
8	Speculative Decoding In Neural Machine Translation
9	Research And Application Of Natural Language Processing In Information Retrieval
10	Parallel Sequence Decoding In Neural Machine Translation