| According to the number of class label, text classification can be divided into:single-label classification and multi-label classification. In practical applications, multi-label classification is quite common. Multi-label classification of the current study focused on multi-label classification of feature selection and classification algorithm. Already, the performance of the existing multi-label feature selection algorithm are difficult to satisfactorily, and some are high time complexity, and some have little impact on the classification performance. At the same time there are also multi-label classification algorithm that do not consider the relation between the labels as well as classification rules can not be explicitly displayed and so on.Through researching the existing multi-label feature selection algorithm, combined with the characteristics of Bootstrap, A Bootstrap-based Combined Multi-label Feature Selection algorithm was proposed in this paper. Firstly, a training set is sampled with bootstrap from the original data set in the early, the feature sets are evaluated by using base feature selection algorithm, and then the weight of features are determined by voting the results of base feature selection algorithm. In the final, features are selected according to the weight of features. Experiments show that the algorithm improves the classification performance with very good results.At the same time, multi-label classification algorithms have been also studied,the rough set theory was applied to multi-label text classification, and proposed a multi-label text classification algorithm based on rough set, the algorithm utilize classification rules of each category acquired in the training phase to match test instance one by one, acquired the label set of the instance. The algorithm expand the application of rough set theory in the text classification,.To consider the relationship between labels, using frequent itemsets algorithm mines the association between labels, and using the association rules validation the results of classification, proposes a multi-label document classification algorithm based on frequent itemsets. Firstly, it uses FP-growth algorithm for mining frequent itemsets between labels, at the same time it calculates prototype vector and similarity threshold for each class, if the similarity between prototype vector and text are greater than the corresponding threshold, then classify the text into corresponding category. After classifying, the association rules between the labels are utilized to verify the result of classification. the experiment result proved the algorithms are effective and feasible. |