The Research And Implementation Of Keyword Extraction

Posted on:2009-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Luo

Full Text:PDF

GTID:2178360278456784

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Keywords are widely used in many applications such as Information Retrieval, Automatic Summarizing, Text Classification, Text Clustering and so on. Only a small minority of documents have author-assigned keywords, and manually assigning keywords to documents is very laborious. Therefore it is highly desirable to automate the keyword extraction process. Many academic journals require their authors to provide a list of about five to fifteen keywords on the first page of each article. Since these keywords are often phrases consisting of two or more words, we prefer to call them key phrases. Most of the keywords from other kinds of documents are also actually phrases, which make the task more difficult.This paper argues that the keywords extraction can be treated as two problems: extracting key words and extracting key phrases. A keywords extraction method based on separating models was proposed for extracting keywords from the documents. This method develops different features for the two mentioned problems in order to improve the accuracy. This paper also considers the problem of automatically extracting keywords from text as a supervised learning task. We treat a document as a set of words or phrases, which the learning algorithm must learn to classify as positive or negative examples of keywords. Based on the different structure of the key words and key phrases, we develop a set of features. For example using the features of mutual information and parameter table of word-sequence boundary can improve the phrases identification. We also use the part-of-speech rule of key words and key phrases to develop some linguistic features to improve the result of extracting key words and key phrases.Based on the above work, we run the experiment to evaluate the effect of the keyword extraction method based on separation model. The result shows that, using the same features, the performance of keyword extraction algorithm based on separation model is better than that based on integrated model. In addition we also evaluated the effect of features for key words and key phrases. At last, to compare the work with the famous keywords extractor KEA, we implemented an keywords extractor based on separation model adopting different key words'features and key phrases'features. The result shows that our extractor is better than KEA.

Keywords/Search Tags:

keyword extraction, key phrases, separating models, mutual information, parameter table of word-sequence boundary, feature selection, machine learning, linguistic feature

PDF Full Text Request

Related items

1	A Study On Feature Selection Algorithms Using Information Entropy
2	The Research Of Multi-label Feature Selection Based On Mutual Information And Feature Label Relationship
3	Research Of Feature Selection For Text Classification
4	Design And Implementation Of Feature Extraction System For Large-Scale Structured Data
5	Research On Feature Selection Algorithms In Machine Learning
6	Research On Dynamic Feature Selection Algorithm Based On Mutual Information
7	Research On Feature Selection Algorithm Based On Mutual Information
8	Research On Chinese Keyword Extraction Algorithm Based On News Report
9	Research On Feature Selection Based On Information Metrics
10	Analysis Of Internet Hot Topics Based On Key Phrases