Font Size: a A A

Sentence Level Alignment In The English-chinese Parallel Corpora And The Application In Machine Translation Studies

Posted on:2011-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhaoFull Text:PDF
GTID:2195330332479542Subject:Chinese language text
Abstract/Summary:PDF Full Text Request
With the advanced computer technology and increased need of human communication, machine translation system with high quality has become a pressing problem that people need to solve. Mona Baker has been applying corpus linguistics to translation research, which opened the research history of bilingual corpus since 1990s of the 20th century. At present, the research of parallel corpus has been become a main focus of corpus research. A lot of research institutions have been dedicated to construct parallel corpus with the gradual increase of scholars' knowledgement about the significance of parallel corpus.The definition of alignment parallel corpus with high quality is realization basis of machine translation system based on real instances. This paper mainly discusses how to develop the automatic alignment technology and construct the corpus of machine translation system based on real instances according to previous researches.The author of this paper ponders over the problems of the process of constructing corpus, including choosing sentence fragments and marks, aligning and matching many-to-many statements, construction rules of parallel corpus existing in machine translation based on web-corpus, and puts forward corresponding solutions taking a series of tests.First, the use of punctuation marks is the important supporting information of sentence-level alignment. We will only use four punctuation:full stop, semicolon, question mark and exclamation mark. Abandon the colons, quotation marks, single quotation marks and parentheses as the sentence boundary.Second, add in the anchor information as auxiliary information to improve the quality of sentence-level alignment. After extensive testing and adjustment, we will use names and structure, number, date. Because of this information has a relatively unique location and sequence.Third, the classifications match. Using this method of sentence-level alignment can reduce the spread of errors. That is to recognize the suspended sentence, slicing the corpus and completed the sentence level alignment work step by step.Fourth, the system construction issues about the parallel corpus used in machine translation. Suggest to selecting the simple page format websites, such as the university thesis databases, journal databases, library and other works in English translation.Fifth, try to introduce "expert control system" to improve the quality of translation results. The thinking and translation skills of contemporary translators can construction the "expert control system" to improve the quality of translation.
Keywords/Search Tags:Parallel Corpus, Sentence-level alignment, Machine Translation
PDF Full Text Request
Related items