Font Size: a A A

A Study On Morphological Features Of English Vocabulary For Automatic Term Extraction

Posted on:2013-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2235330371970938Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Terminology is the crystallization of knowledge. It is a fairly effective way to understand the general trends and development of certain subjects by examining terminology. Terminology extraction is one of the key technologies of constructing a large-scale ontology automatically or semi-automatically. Corpus-based automatic term extraction is one of the hot topics in natural language processing studies nowadays. Currently, three approaches are predominant in term extraction studies. They are linguistic approach, statistical approach and hybrid approach. On the basis of literature review, the present study proposed a new approach for term extraction based on morphological features, and investigated the contributions morphological features makes to the improvement of automatic term extraction.Based on a corpus of1200thousand words, the present study evaluates the efficiency of different term extraction approaches via self-written FoxPro programs. The thesis mainly compares single algorithm approach and hybrid approach, including approach based on statistics, approach based on combinations of part of speech (hereinafter posgram), approach based on morphological features, approach based on statistics with morphological features as a filter, approach based on posgram with morphological features as a filter and approach based on morphological features with posgram as a filter. In terms of precision, the approach based on morphological features with posgram as a filter has the highest precision of44.60%, followed by the approach based on posgram with morphological features as a filter with a precision of42.19%. In terms of recall, posgram-driven approach is the most efficient whose recall is55.89%, followed by morphology-driven approach, which is49.96%. More over, the value of precision or recall by statistical-driven approach is fairly low, whereas after adding the restriction of morphological features, the value of both precision and recall has been significantly improved. Therefore, morphological features have a positive influence on the efficiency of term extraction.
Keywords/Search Tags:Term Extraction, Precision, Recall, Morphological Features
PDF Full Text Request
Related items