| There is a great amount of multilingual corpus on the Internet. How to extract theright translation of a terminology from non-parallel, comparable or partial parallelcorpus by web mining and information extraction technology is a challenging workwidely concerned by researchers.This paper first analyzes current terminology translation technologies andsummarizes their advantages and disadvantages. Second, in view of Chinese-Englishtranslation, we analyze the presence and availability of partial parallel corpus on Web,and implement an automatic terminology translation system. It submits a pair ofsource terminology E and its translation F to a search engine, extracts the matchingpatterns from returned summaries, and scores the patterns by their occurrencefrequency. On selecting translation candidates, we find four heuristic rules, and anempirical formula to score the candidates. The experiments indicate that comparedwith other existing system, our system can extract reasonable translation of a giventerminology fromWeb automatically and effectively.Many web pages include a great deal of terminologies. How to find these pagesand to extract terminology pairs from them is also an interesting work. In this paperwe use existing terminology pairs (terminologies in glossary or dictionary) as"seed"words, submit them into search engine, and then adopt four rules to analyze thesnippets from returned summaries and find web pages including a great deal ofterminologies automatically. Next, we extract these terminologies according to thespecified format by using regular expressions, and put them into local dictionary forthe continuously expansion of the local dictionary.This paper also gives some suggestions to automatic terminology translationbased on Web, makes expectation to the future work, and proposes some problemsworthy of study. |