Research On Bilingual Sentence Alignment Including Professional English-Chinese Unknown Words

Posted on:2013-08-01

Degree:Master

Type:Thesis

Country:China

Candidate:L L Quan

Full Text:PDF

GTID:2268330395986730

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Natural language processing is a significant field of computer science andartificial intelligence, and it can use natural language to communicate effectivelybetween people and computer through various theories and methods. Machinelearning is a branch of the NLP research, and the premise of this study is to build alarge-scale corpus. Researches on Chinese-English bilingual corpora which includesprofessional unknown words are lacking, and it resulted non-professional andimbalance of the machine translation, which motivates the research of thisdissertation.The goal of this paper is to build a bilingual sentences alignment system. Thesystem can align the text from the section alignment into the sentence alignment.This paper is mainly divided into three parts.Firstly, we designed an evaluation function of sentence alignment, designedsentence alignment algorithm based on length and searched algorithm for sequenceof the optimal sentence. We downloaded bilingual pages from a bilingual website:China Text (CNKI). After that, we analyzed the bilingual pages, removed the pagelabels, which are useless, retained bilingual messages and established the bilingualcorpus which is based on segment alignment. We kept the Keywords of bilingualabstract, which are in the website.Secondly, we extracted dictionaries from a translation software: StarDict,analyzed original format of the dictionaries, and transformed the dictionaries into acustom format for bilingual sentence alignment system. Put English-Chinesekeywords together into the dictionary, which are extracted in the previous step. Ithelps to expand the number of words and increase the professionalism of vocabulary.Finally, we extracted English word stem using the method of extracting stem tosimplify complexity of processing English words and improved system efficiency. We achieved bilingual sentence alignment system, and did a comparative experimentwith adjusting the parameters to test performance of the system.

Keywords/Search Tags:

Sentence alignment, Bilingual Corpus, Professional Unknown Words, Stem extract

PDF Full Text Request

Related items

1	The Research Of Sentence Alignment In Chinese-Uighur Bilingual Corpus
2	Bilingual Alignment Platform Multi-language Design And Implementation
3	Design And Implementation. IHSMTS Chinese-English Bilingual Sentence Alignment Mechanism
4	Research And Implementation Of Bilingual Corpus Mining On The Internet
5	Research Of Bilingual Sentence Alignment Served The Chinese-Uyghur Machine Translation System
6	A Study On The Alias â€‹â€‹of Chinese - Old Bilingual Words
7	The Desing And Implementation Of Uyghur-Chinese Parallel Corpus Processing System
8	Chinese Uygur Kazak Kirgiz Bilingual Corpus Processing System Design And Implementation
9	Research On The Construction Of Ancient English Parallel Corpus Based On Multi-Level Automatic Alignment
10	Design And Implementation Of Automatic Construction System Of English-chinese Parallel Corpus