Font Size: a A A

Research On The Construction Method Of Lao-Chinese Bilingual Corpus With Thai Language As The Pivot

Posted on:2020-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:N ( N Y I A N U N O U L A O Full Text:PDF
GTID:2435330599955753Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The Lao-Chinese bilingual corpus is an important data resource for Chinese-Lao machine translation and cross-language retrieval.Lao is a language with scarce resources in Southeast Asian languages.The parallel resources of Chinese-Lao bilingual are relatively scarce,and the LaoChinese is directly obtained from the Internet.Bilingual parallel resources have great difficulties.Laotian and Thai are relatively similar languages.Chinese-Thai bilingual resources are relatively abundant.To this end,the thesis uses the similar characteristics of Thai and Lao languages to propose an Lao-Chinese bilingual parallel corpus with Thai as the pivotal language.The method and experiment prove that the proposed method has certain theoretical significance and practical application value for the construction of Chinese-Lao bilingual corpus.The research work of the thesis is mainly reflected in the following aspects:1.Using the web crawler technology to automatically obtain a certain scale of Chinese-Thai bilingual parallel corpus from the Chinese-Thai bilingual news website,Wikipedia,and ChineseThai bilingual learning website,and manually verify the data to construct the Chinese-The Thai double sentences alignment corpus and the Lao-Thai double sentences alignment corpus analyze the language similarity and difference in Lao language and Thai language in terms of word formation,pronunciation and syntax.2.A method for constructing Lao-Chinese bilingual corpus in Thai language is proposed.Firstly,the Thai sentence is selected from the constructed Chinese-Thai double-sentence-aligned corpus,and the Thai-language dictionary is translated by using the Lao-Thai bilingual dictionary.The word sequence corresponding to Lao is used as a candidate Laotian sentence,and then the candidate Lao language sentence is corrected by training the Lao language model.Finally,the Lao-Thai bilingual parallel sentence pair classification model is constructed by using the convolution network and the bidirectional LSTM.The Lao-Thai bilingual parallel sentence pairs were extracted,and the Lao-Chinese bilingual parallel corpus was constructed with the Thai language as the pivotal language.The experiment proves that the proposed model has a model accuracy rate of 72.62% and a recall rate of 70.02%.3.Based on Tensor flow platform,the Lao-Thai bilingual parallel sentence pair extraction model is constructed.Based on Java EE technology,a prototype system based on Thai-languagebased Chinese-Lao bilingual corpus is developed.
Keywords/Search Tags:linguistic similarity, Neuro linguistic model, convolutional neural network, bidirectional LSTM, Lao-Chinese bilingual parallel corpus
PDF Full Text Request
Related items