| As a kind of statistical technique that is able to effectively analyze and process uncertain information such as imprecision,incompleteness,and inconsistency,rough sets have been widely used in the fields of knowledge discovery in database,machine learning,pattern recognition,decision support,predictive modeling and fault diagnosis,etc.The classic Pawlak rough set is only suitable for processing discrete data,and can not directly deal with the common numerical data in practical applications.For solving this problem,based on Pawlak rough set,neighborhood rough sets can effectively handle these two types of data by introducing the concepts of neighborhood granulation and rough approximation.Knowledge reduction is one of the main applications of rough sets,and its goal is to maintain the original knowledge expression ability and delete redundant knowledge in knowledge expression system.The study of knowledge reduction based on rough sets consists of two parts: the attribute reduction algorithm for decision tables and the attribute reduction algorithm for information tables.The two kinds of algorithms correspond to classification and clustering tasks respectively.Based on neighborhood rough sets,this paper studies the attribute reduction for decision tables and information tables as follows:(1)study on attribute reduction for decision tables based on neighborhood rough sets:Ⅰ.For an algorithm of attribute reduction based on the neighborhood rough set,the calculation of positive region is the necessary basis of its efficient performance and the uppermost part of its time cost.In existing positive region calculations,there are usually a large measure times between samples that have the same category.Aimed at this case,this paper firstly proves that the measure between samples that have the same category is meaningless to the positive region calculation in neighborhood rough set.Then according to the proof,a calculation of positive region and an algorithm of attribute reduction based on categories of samples are proposed.The experimental result shows that this proposed algorithm is effective and faster,and more suitable for data sets with fewer categories of samples.Ⅱ.the calculation of positive region of neighborhood rough sets follows the inclusion relation of Pawlak rough sets,resulting in its poor fault tolerance.Therefore,the minimum risk decision rule is used to evaluate the risk of the calculation,and then a new calculation and an algorithm of attribute reduction based on fault-tolerant improvement are proposed.The experimental results show that the attribute reduction set obtained by this proposed algorithm is betterr,and the classification algorithm based on the set has higher accuracy.(2)study on attribute reduction for information tables based on neighborhood rough sets:In order to design an attribute reduction algorithm for information tables,a knowledge reduction criterion for information tables of neighborhood rough sets and an algorithm of attribute reduction based on this criterion are proposed based on the knowledge reduction criterion for information tables of Pawlak rough sets.The experimental results show that the number of attributes of attribute reduction obtained by this proposed algorithm is more,and the clustering algorithm based on the set has higher accuracy. |