Font Size: a A A

The Text Of The Same Event In The Chinese-english Bilingual Web Resource Extraction,

Posted on:2006-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2205360155474940Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
As Internet blooms today, more and more multi-language electrical texts are accessible. In this circumstance, the studying of parallel corpus become one of the linguistics hot spot. But how can we get this parallel texts? This is an fundamental problem often ignored by most researchers. This paper is an exploration in this filed.The paper first analyzes the parallel texts resources in bilingual webs, pointing out that we can help the researchers getting parallel corpora more efficiency and abundant by extracting the SETP(same event text pair). On the base of researching the characteristic of Chinese-English SETP, we found that the Named Entity can suitably represent the text 's theme and we can use the similarity of Named Entity to extract the SETP.Based on the characteristic of the extraction of Chinese-English SETP, this paper probes into the Hownet and analyzes the relationship among its taxonomy, unit and word. On the foundation of these works ,we get a method using Hownet to calculate the similarity of Chinese and English word. Fully utilizing the linguistic resources we have, we suggest using dictionaries of names, place, pin yin to resolve the similarity of unknown Named Entities which are mostly pin yin.We also construct an quaint Named Entity structure to represent the text. By this way, we turn the work of extracting SETP into the work of calculating the Named Entity's similarity. We also designed a suit of formulae which can express Named Entities in different part of the text have different importance.Our thinking and method have been proved by the test outcome. Our SETP-extraction system have got a exactness ratio of 93%.
Keywords/Search Tags:parallel corpra, Named Entity, Hownet, similarity
PDF Full Text Request
Related items