Automatic Terminology Translation Based On Web And Information Extraction Technologies

Posted on:2008-12-18

Degree:Master

Type:Thesis

Country:China

Candidate:J S Zhou

Full Text:PDF

GTID:2178360245494090

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

There is a great amount of multilingual corpus on the Internet. How to extract theright translation of a terminology from non-parallel, comparable or partial parallelcorpus by web mining and information extraction technology is a challenging workwidely concerned by researchers.This paper first analyzes current terminology translation technologies andsummarizes their advantages and disadvantages. Second, in view of Chinese-Englishtranslation, we analyze the presence and availability of partial parallel corpus on Web,and implement an automatic terminology translation system. It submits a pair ofsource terminology E and its translation F to a search engine, extracts the matchingpatterns from returned summaries, and scores the patterns by their occurrencefrequency. On selecting translation candidates, we find four heuristic rules, and anempirical formula to score the candidates. The experiments indicate that comparedwith other existing system, our system can extract reasonable translation of a giventerminology fromWeb automatically and effectively.Many web pages include a great deal of terminologies. How to find these pagesand to extract terminology pairs from them is also an interesting work. In this paperwe use existing terminology pairs (terminologies in glossary or dictionary) as"seed"words, submit them into search engine, and then adopt four rules to analyze thesnippets from returned summaries and find web pages including a great deal ofterminologies automatically. Next, we extract these terminologies according to thespecified format by using regular expressions, and put them into local dictionary forthe continuously expansion of the local dictionary.This paper also gives some suggestions to automatic terminology translationbased on Web, makes expectation to the future work, and proposes some problemsworthy of study.

Keywords/Search Tags:

automatic terminology translation, information extraction, search engine, regular expression

PDF Full Text Request

Related items

1	Collected Information Based On Regular Expression Engine Applied Research
2	Research Of Meta Search English Based On Agent Technology
3	Research On Automatic Generation Methods Of Regular Expression Matching Engines On FPGA
4	The Research And Implementation Of Web Information Extraction System Based On The Regular Expression
5	Automatic Construction Application Of Bilingual Terminologies For E-Commerce
6	Mail Address Automatic Extraction System Based On Search Engine Secondary Development
7	A Rehabilitation Information Search Engine Based On Online Translation
8	The Design And Implementation Of Regular Expression Engines Based On Deterministic Finite Automata
9	Automatic Click Based On Dynamic IP And Its Influence On The Search Engine Ranking
10	The Design And Implementation Of Website Analysis Module Based On Shopping Search Engine