| Text classification is that according to a content of the new text automatic classification, to determine the text of the classes basic on a given text categorization system. There are many text classification methods, such as:KNN algorithm, Rocchio classification algorithms, Decision tree algorithm and Naive Bayes algorithm. Rough set theory is an effective mathematical tool to analysis uncertainty knowledge. It can be performed directly on the data analysis and processing, discovery its inherent knowledge and laws. The rough set theory is applied to text classification is currently one of the hotspots. Its research focuses on the theory and application of knowledge reduction, the theory of knowledge reduction can obtain the classification decisions and rules in the case of classification capability not reduced. The process of using the theory of knowledge reduction to deal with attribute information table and removing redundant information without affect the classification ability that is the process of attribute reduction. The attribute reduction of decision table is the most important part of the text classification technology based on the rough set theory,and this process is divided into two parts:the attributes in order of importance, and get attribute reduction set.This article will focus on the theory of knowledge reduction of rough set theory is applied to the core content of text classification-Attribute Reduction.First,use feature selection,extraction and text descriptions to handle training text set to get a bunch of feature vectors to represent a collection of text and text category;and then, using this information to build a decision-making information table; Finally, using the theory of knowledge reduction to process the attribute information of decision table,remove the redundant information and get classification rules, in the case of not affecting the ability of the classification. The main innovation of this paper is as follows:(1) Improved the calculation of the approximation operators in rough set theory. On the one hand, make the equivalence of relation rough set theory is extended to tolerance relation or include relationships; On the other hand, from the structure of basic knowledge granularity and the representation of knowledge,study the approximation operator based on the neighborhood system and granularity.(2) On the basis of text classification and rough set theory, put forward a evaluation criteria of attribute importance which comprehensive of feature selection and rough set theory. In the text categorization process, the weights of feature item and the evaluation criteria of rough sets itself combine to make the set of attributes more importance after reduction, showing better text recognition rate. And through the study of attribute reduction algorithm of the rough set theory, proposed an improved attribute reduction algorithm and apply them into text classification techniques. Numerical experiments show that the use of this text classification technology for small-scale text set test can get better classification results. |