Multi-label Text Classification Algorithm Research

Posted on:2011-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Lv

Full Text:PDF

GTID:2178360308976258

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

According to the number of class label, text classification can be divided into:single-label classification and multi-label classification. In practical applications, multi-label classification is quite common. Multi-label classification of the current study focused on multi-label classification of feature selection and classification algorithm. Already, the performance of the existing multi-label feature selection algorithm are difficult to satisfactorily, and some are high time complexity, and some have little impact on the classification performance. At the same time there are also multi-label classification algorithm that do not consider the relation between the labels as well as classification rules can not be explicitly displayed and so on.Through researching the existing multi-label feature selection algorithm, combined with the characteristics of Bootstrap, A Bootstrap-based Combined Multi-label Feature Selection algorithm was proposed in this paper. Firstly, a training set is sampled with bootstrap from the original data set in the early, the feature sets are evaluated by using base feature selection algorithm, and then the weight of features are determined by voting the results of base feature selection algorithm. In the final, features are selected according to the weight of features. Experiments show that the algorithm improves the classification performance with very good results.At the same time, multi-label classification algorithms have been also studied,the rough set theory was applied to multi-label text classification, and proposed a multi-label text classification algorithm based on rough set, the algorithm utilize classification rules of each category acquired in the training phase to match test instance one by one, acquired the label set of the instance. The algorithm expand the application of rough set theory in the text classification,.To consider the relationship between labels, using frequent itemsets algorithm mines the association between labels, and using the association rules validation the results of classification, proposes a multi-label document classification algorithm based on frequent itemsets. Firstly, it uses FP-growth algorithm for mining frequent itemsets between labels, at the same time it calculates prototype vector and similarity threshold for each class, if the similarity between prototype vector and text are greater than the corresponding threshold, then classify the text into corresponding category. After classifying, the association rules between the labels are utilized to verify the result of classification. the experiment result proved the algorithms are effective and feasible.

Keywords/Search Tags:

multi-label, Roughset, frequent itemsets, Bootstrap, feature select

PDF Full Text Request

Related items

1	Research On Top-K Frequent Itemsets Datamining Algorithm
2	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
3	Research On Algorithm For Mining Frequent Itemsets Of Uncertain Data
4	FP-Tree Based Mining Frequent Itemsets Over Data Streams
5	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
6	Research On Frequent Closed Itemsets Mining Algorithms
7	Research On Algorithms For Mining Maximal Frequent Itemsets
8	Research On Incremental Updating Of Maximum Frequent Itemsets And Maximum Length Frequent Itemsets
9	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application
10	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System