Font Size: a A A

Research Of English-Chinese Word Alignment Based On Multi-Strategy

Posted on:2010-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:L H ZhouFull Text:PDF
GTID:2178360272485240Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Alignment (WA), which can be defined as an object for indicating the corresponding words in a parallel text, was firstly introduced as an intermediate result of statistical machine translation. In the technique of English-Chinese WA, the diversity and feasibility of morphology, semantics and syntax, with out-of-vocabulary words and segmentation error directly or indirectly affect the WA quality on a certain extent. Noteworthy people have much useful information will improve the aligned quality.Up to the present moment, many alignment algorithms have been proposed with very high precision. In fact each representative algorithm has its particular predominance. Such as the statistic-based method, that can identify some of unknown words but needs bilingual sentence pair on a large scale; The method based on dictionary has a higher reliability, but it can not recognize the unkown words; Though the method based on HowNet can deal with semantic information on a certain extent, the current capability of HowNet is very limited and the definitions of the semes should be more subtle. This thesis introduces the present research situation of WA research in domestic and abroad, and illustrates several representative approaches to WA research, with analysis of their theoretical basis and algorithm characteristics.WA of bilingual corpus is useful for many NLP applications. This paper uses a syntax latent rule to guide the disambiguation process of WA, according to the analysis of the bilingual corpus and the alignment result. Under this viewpoint, a new multi-strategy method has been proposed hoping to make use of the hidden info among the present method, which by combing the lexical information, the linguistic knowledge of HowNet and statistic-based Giza++.The experiments show that our method achieves a good performance, obtained an F-score of 85.15%, which is improved 10% than optimized Giza++. At the same time alignment error ratio is decreased by 10%, better performance is obtained in SMT application. The strategy complements the advantages of those algorithms according to the reliability and consistence of them. The analysis and comparison of the results show the multi-strategy based method is effective and easy to realize, and adapt to support more and more WA methods. By analysing the result of multi-strategy method, this paper finds that aligning errors mainly lie in the place where word segment is wrong. In order to optimize the results, the paper implemented a module of single word alignment, and obtained an F-score of 95.01% and alignment error ratio is 0.05.
Keywords/Search Tags:Word Alignment, Multi-strategy, Wholly Compatible, Semi-compatible, Incompatible, Anchor
PDF Full Text Request
Related items