Font Size: a A A

Noun Phrase. Chinese-english Parallel Corpus Alignment Algorithm

Posted on:2004-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:S XueFull Text:PDF
GTID:2208360095956184Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computers and the Internet, the use of bilingual (multilingual) parallel corpus has become an important issue in the field of Natural Language Processing. Parallel corpus has valuable application in machine translation, bilingual dictionary compilation, word sense disambiguation and Cross-Lingual Information Retrieval.In the exploiture of parallel corpus, the research of alignment at different levels is an essential topic. In order to extract linguistic knowledge from parallel corpus, it is necessary to align them first. Alignment is also an important phase before Example-Based Machine Translation (EBMT) can make use of parallel corpus.This thesis firstly introduces the application of bilingual corpus and alignment in Machine (-Aided) Translation. The construction of Large-Scale Chinese-English Parallel Corpus is discussed, including resource collecting, corpus encoding, sentence alignment and concordance. Then the noun phrase alignment algorithm combining the use of rules and statistics is discussed. The algorithm uses an English parser to identify English noun phrases, and a set of syntactic patterns to filter out invalid candidates of Chinese translation correspondences. Finally the best candidate is selected as the Chinese translation of the English noun phrase by similarity measures based on co-occurrence. This method attacks the weakness of traditional pure rule-based approaches using bilingual dictionary and the accurate rate is higher.
Keywords/Search Tags:parallel corpus, alignment, noun phrase alignment
PDF Full Text Request
Related items