Font Size: a A A

The Research And Implementation Of Keyword Extraction

Posted on:2009-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LuoFull Text:PDF
GTID:2178360278456784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Keywords are widely used in many applications such as Information Retrieval, Automatic Summarizing, Text Classification, Text Clustering and so on. Only a small minority of documents have author-assigned keywords, and manually assigning keywords to documents is very laborious. Therefore it is highly desirable to automate the keyword extraction process. Many academic journals require their authors to provide a list of about five to fifteen keywords on the first page of each article. Since these keywords are often phrases consisting of two or more words, we prefer to call them key phrases. Most of the keywords from other kinds of documents are also actually phrases, which make the task more difficult.This paper argues that the keywords extraction can be treated as two problems: extracting key words and extracting key phrases. A keywords extraction method based on separating models was proposed for extracting keywords from the documents. This method develops different features for the two mentioned problems in order to improve the accuracy. This paper also considers the problem of automatically extracting keywords from text as a supervised learning task. We treat a document as a set of words or phrases, which the learning algorithm must learn to classify as positive or negative examples of keywords. Based on the different structure of the key words and key phrases, we develop a set of features. For example using the features of mutual information and parameter table of word-sequence boundary can improve the phrases identification. We also use the part-of-speech rule of key words and key phrases to develop some linguistic features to improve the result of extracting key words and key phrases.Based on the above work, we run the experiment to evaluate the effect of the keyword extraction method based on separation model. The result shows that, using the same features, the performance of keyword extraction algorithm based on separation model is better than that based on integrated model. In addition we also evaluated the effect of features for key words and key phrases. At last, to compare the work with the famous keywords extractor KEA, we implemented an keywords extractor based on separation model adopting different key words'features and key phrases'features. The result shows that our extractor is better than KEA.
Keywords/Search Tags:keyword extraction, key phrases, separating models, mutual information, parameter table of word-sequence boundary, feature selection, machine learning, linguistic feature
PDF Full Text Request
Related items