Research On The Method Of Chinese-old Bilingual Word Alignment And Dependency Tree Construction | | Posted on:2018-09-22 | Degree:Master | Type:Thesis | | Country:China | Candidate:R C Yin | Full Text:PDF | | GTID:2358330518460444 | Subject:Computer application technology | | Abstract/Summary: | PDF Full Text Request | | With the rapid development of science and technology and social economy,global interconnection has become an irresistible trend of development,the first is the deepening of the cross-language information exchange.It is inconceivable to manipulate these data based on manual translation in the face of massive and dynamically changing multilingual information on the web.The only solution is to make full use of machine translation technology to achieve automatic translation services,which set off a study of the field of machine translation wave.Language communication and understanding is the basis for the exchange of economic and cultural aspects between countries.China and Lao are no exception.In-depth study of Chinese-Laos bilingual can also be used to build Chinese-Laos bilingual corpus resources.Bilingual word alignment is a core task in NLP.It can obtain the language of translation from bilingual parallel corpus and become one of the main sources of machine translation knowledge.Bilingual word alignment can provide basic support for any applications in the field of natural language research,such as the construction of dependent trees,bilingual dictionary compilation,machine translation,bilingual information extraction and other applications.This thesis studies the automatic alignment of Chinese and Laos bilingual words and constructs a bilingual parallel corpus with a certain scale plays an important supporting role in the information processing of Chinese and Laos bilingual.This thesis discussed some Chinese-Laos bilingual word alignment methods and build Laos Dependency Treebank by Means of Chinese-Laos,Mainly completed the following three aspects:(1)In view of the differences in the grammar characteristics between Chinese and Laos,we discuss the different linguistic characteristics of Chinese and Laos from the different order between modifier and central word in Chinese and Laos bilingual sentence structure.(2)We propose a Chinese-Laos bilingual automatic word alignment algorithm that incorporates syntactic features.There is a great difference between Chinese and Laos in grammar and syntactic structure,so Chinese-Laos bilingual automatic word alignment is difficult to be achieved.In this case we present an automatic alignment method for Chinese and Laos bilingual words that incorporates a variety of feature constraint models.In this paper,we first analyze and select the useful features of Chinese-Laos bilingual words and combine these characteristics to form multiple feature constraint models.And then use the log-linear model framework and training the model under the condition of the minimum error rate algorithm,and finally get the result of the automatic alignment of Chinese and Laos bilingual words.In the experiments,the basic model is IBM Model 3,and the experimental results show that the bilingual word alignment method has a good alignment effect,which is obviously superior to the basic model.(3)We put forward a method to build Laos dependency treebank by means of Chinese-Laos bilingual corpus of word alignment.As few studies on Laos,there has not built relatively large dependency Treebank.This paper presents a method of constructing Lao language dependent Treebank by using Chinese and Laos bilingual word alignment corpus.On the basis of obtaining the parallel corpus of Chinese and Laos bilingual words,we first analyze the Chinese sentences in parallel corpus.Then,based on the linguistic characteristics of Laos,the dependency relation of Chinese sentences is mapped to Laos sentence through Chinese-Laos bilingual word alignment on the basis of dependency rules,and finally the dependency tree of Laos sentence is generated.The results show that the accuracy of this method has been improved significantly compared with machine learning methods.This approach simplifies the manual annotation work in the process of constructing the Laos dependency Treebank,which saves a lot of manpower and material resources.In the case of the scarcity of Laos corpus,this approach can automatically build the Laos Dependency Treebank with high quality. | | Keywords/Search Tags: | Chinese, Laos, Word Alignment, Log-linear model, Laos Dependency Treebank, Chinese Dependency Parsing | PDF Full Text Request | Related items |
| |
|