Font Size: a A A

Chinese collocation extraction and its application in natural language processing

Posted on:2008-09-03Degree:Ph.DType:Dissertation
University:Hong Kong Polytechnic University (Hong Kong)Candidate:Li, Wanyin ClaireFull Text:PDF
GTID:1445390005462247Subject:Computer Science
Abstract/Summary:
The traditional approaches in collocation extraction mainly use statistical models based on co-occurrence association measures, which lead to poor performance both in terms of recall and precision. Collocation extraction in this study explore methods to use collocations features in terms of statistical significance as well as syntactic and semantic information.;The first part of this study investigates how to adapt a well known statistical-based system, Xtract for English, for Chinese collocation extraction. In addition to parameter tuning for Chinese, an enhanced algorithm bad on mutual information is developed to extract collocations with relatively low frequencies to improve recall performance. The second part of this study investigates methods to take into consideration of syntactic information to eliminate pseudo collocations and identify low frequency collocations which suit certain syntactic patterns. The syntactic information is based on Part-of-Speech tagging patterns which are obtained from a chunked Chinese corpus. However, the collocation extraction algorithm does not require the testing data to be chunked. The third part of this study investigates methods to take into consideration of semantic information to further improve recall of collocation extraction by using synonym information. The last part of this research explores how to make use of collocation information in word sense disambiguation (WSD). Results show that collocation information can improve the performance of WSD ranging from 3% to 10% using different data sets.
Keywords/Search Tags:Collocation, Information, Chinese, Performance
Related items