Research On The Construction Method Of Chinese-Myanmar Bilingual Theme Model With Multiple Features

Posted on:2018-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Wang

Full Text:PDF

GTID:2358330518960491

Subject:MEMS

Abstract/Summary:

PDF Full Text Request

The Chinese-Burmese bilingual parallel corpus is the basic resource for the study of Chinese-Burmese Machine Translation,cross language retrieval,parallel sentence extraction and bilingual entity extraction.Based on model analysis of topic model of cross language as variety language documents,it can calculate the correlation between different language documents from the semantic level,it provides a great support that we obtain the Chinese-Burmese document,therefore,how to build Chinese-Burmese bilingual topic model has an important significance for acquiring Chinese-Burmese document.In this paper,we take the corpus construction as the starting point,and obtain the comparable corpus through the topic model,the main achievements are as follows:(1)The construction of the Chinese-Burmese parallel corpus.The Chinese-Burmese bilingual text resources are scarce,there is no public authority of the Chinese-Burmese corpus,bilingual topic model construction requires a certain amount of Chinese and Burmese bilingual parallel document topic model as the training set,and study the parallel document quality will influence the text topic model in the future.In this paper,we gives a detailed introduction to the methods of Chinese-Burmese bilingual texts,including web page text,electronic magazine and WeChat platform.For the web page text,detailing the use of reptiles technology to automatically obtain the process,for the electronic magazine and WeChat platform,illustrates the process of manual acquisition also.Finally,the resources are integrated into the Chinese-Burmese bilingual parallel corpus and the corresponding data storage methods are illustrated.(2)This paper proposes a new model of Chinese-Burmese bilingual theme based on context features.The model is based on the bilingual LDA topic model,which integrates the context features of the text.The bilingual LDA model uses the relevance of the parallel text,that is to say,the parallel text shares the same text topic distribution matrix,while the fusion context feature solves the problem that the model does not consider the text structure.The essence is to model fusion reduces the negative impact on the theme of the text word frequency distribution,the experimental results show that the proposed fusion context features of the Chinese-Burmese bilingual topic model has a better effect in the text subject distribution.(3)This paper proposes a new model of Chinese-Burmese bilingual theme based on semantic extension.The theme is based on the characteristics of previous chapter model,further integration of Chinese-Burmese semantic dictionary,by analyzing and processing of the dictionary,constructed the Chinese semantic extension Burmese set based context features of words weighted weights,set a threshold,to exceed the threshold of words by the extended set corresponding to the expansion of Burmese,through the semantic extension,can solve the problem of a variety translations of Burmese words.We will expand the context feature and semantic feature fusion in a bilingual LDA model,finally,through the comparative analysis of experimental results,this paper constructs the bilingual topic model based on multi feature fusion with the contrast experiment has a better performance.

Keywords/Search Tags:

burmese, parallel corpus, LDA model, bilingual topic model, context feature, semantic extension

PDF Full Text Request

Related items

1	Research On Bilingual Entity Extraction Method Based On Chinese-Burmese Bilingual Corpus
2	Research On Large-Scale Bilingual Parallel Corpus Extraction From The Web
3	Research On Chinese-Thai Bilingual Corpus Mining Method For Internet News
4	Research On Chinese-Myanmar Neural Machine Translation Method With Monolingual Corpus
5	Research On The Application Of Chinese-Burmese Bilingual Sentence-level Embedding Semantic Representation Method Based On Neural Network
6	Research On Bilingual Text Clustering Based On Semantic Duality Model
7	Research On Acquiring Bilingual Parallel Sentences And Building Corpus
8	Design And Implementation Of Automatic Construction System Of English-chinese Parallel Corpus
9	Bilingual Word Embedding Based Word Alignment On Large-Scale Corpus
10	Research And Application Of Chinese Word Segmentation Based On English-Chinese Parallel Corpus