| Machine Translation(MT)is the process of converting one natural language into another by machine.The research on Tibetan-English machine translation is of great significance to inheriting and carrying forward the excellent national culture,promoting cultural exchange and seeking the communication of ideas.The Belt and Road Strategy for Serving Countries;Promoting the development of society,economy,education and culture in Tibetan areas is of great practical significance.Machine translation(MT)is a branch of computational linguistics(Computational Linguistics),involving computer technology,mathematics,cognitive science,linguistics,information theory and other disciplines of the intersection and relationship,is one of the ultimate goals of artificial intelligence.Therefore,the research on phrase-based Tibetan-English statistical machine translation technology can promote the substantive development of Tibetan computational linguistics,which has very important scientific research value and practical application value.In 1990 s,based on Waever’s idea,Peter Brown of IBM Company put forward a mathematical model of statistical machine translation,which regards machine translation as a noise channel problem.The performance of machine translation using this mathematical model is far better than that of traditional rule-based machine translation.From then on,statistical machine translation has become the focus of machine translation research,and word-based,phrase-based and syntax-based translation models have been proposed.In these statistical machine translation models,phrase-based statistical machine translation model is the mainstream because of its simple model,high robustness and good translation performance,which has become the focus of current research and application.Based on the phrase-based statistical machine translation model,this thesis probes into the key technical problems of phrase alignment,phrase extraction,reordering model,parameter training and decoding of phrasal translation model,and takes the phrase-based statistical machine translation system of the School of Information Science and technology of Tibet University as the experimental platform.The thesis tries to improve the performance of the phrase-based statistical machine translation system by improving the key problems of the phrase translation model.Specifically,the research content of this thesis mainly includes the following aspects:(1)Word alignment: This thesis focuses on the word alignment technology of IBM Model 1-5,and expounds the related research work in the field of Word alignment,and uses a discriminant word alignment method based on IBM Model 4 to solve the shortcomings of one-way word alignment method.(2)Phrase extraction: This thesis mainly introduces the continuous phrase pair extraction technology of Och,and expounds the related research work of phrase on the extraction field.According to the characteristics of the Tibetan language itself,an improved phrase is selected to extract more phrase pairs from the Tibetan English word alignment corpus,but at the same time a lot of wrong phrase pairs are extracted.To this end,we filter the probability table of Tibetan English phrase translation by an effective filtering method,and filter out most of the wrong phrase pairs in the phrase translation probability table to ensure the accuracy of the translation model of the Tibetan English phrase.(3)Reordering model: The difference of word order between two languages in Tibetan and English is relatively complex.Aiming at this problem,this thesis deeply studies the structure of modern Tibetan phrases,and collates and sums up 29 kinds of three main phrase structure rules,such as modern Tibetan noun phrases,verb phrases and adjective phrases.On this basis,this thesis analyzes and compares the syntactic structure of Tibetan and English,summarizes 14 common phenomena of Tibetan English order difference,and puts forward a Tibetan sentence reordering model based on syntactic information in order to improve the performance of Tibetan English machine translation.(4)Tuning the model feature weights: In this thesis,under the framework of logarithmic linear model,the decoding process of minimum error rate training method is studied emphatically,16 kinds of translation features are used in the experiment,and the optimal parameters are obtained through 20 iterative training. |