Font Size: a A A

Parallel Processing On Parallel Corpus Of Chinese-English

Posted on:2007-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:M X FengFull Text:PDF
GTID:1115360185977413Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The research on parallel corpus is a new trend for corpus linguistics horizontal development. It has being known that the large Chinese-English parallel corpus of high quality has great value in the fields of the natural languages processing, the research of comparative linguistics and the teaching of second language, etc. But when compared with monolanguage corpus, the scale and quality of the Chinese-English parallel corpus are still far from users' satisfaction.In order to improve the processing precision of the Chinese-English parallel corpus, so as to meet the requirement of the construction and using of large scale parallel corpus, this dissertation tries to take the parallel processing to Chinese-English parallel corpus as the main researching target and make use of bilingual information, especially the information from another language to solve ambiguities of one language among the parallel corpus.The following achievements have been obtained through this research:1. A systematic research of the parallel processing technique has been performed. The research has not only defined the meaning of the parallel processing, its position and value in the processing of the parallel corpus, the levels and types of language resources among the parallel corpus, which were used in disambiguation, but also demonstrated the applying approaches and validity of the parallel processing technique on each level of the natural language processing in detail, such as the recognition of the unknown words, the tagging of POS, the tagging of word sense and the syntactic analysis.2. The parallel processing technique is bidirectional (Chinese-English / English-Chinese). We have not only made use of English to settle the ambiguities in Chinese, including the recognition of Chinese unknown words, the tagging of Chinese words with polysemy and the recognition of phrasal type for Chinese phrases of "Verb+Noun", but also settled the ambiguities in English by using Chinese, such as the disambiguation of English POS and word sense.3. The POS and word sense disambiguation approaches based on individual rules have been experienced in non-lexical paralleled parallel corpus. The statistic models are suitable for processing the problem of concentrated data. This dissertation used statistic approach to carry out the parallel recognition of the Chinese translation of English...
Keywords/Search Tags:Natural language processing, Bilingual corpus, Parallel corpus of Chinese-English, Parallel processing, Automatic segmentation, Part of speech disambiguation, Word sense disambiguation, Syntactic disambiguation
PDF Full Text Request
Related items