Font Size: a A A

Research On Attribute Reduction Algorithm Of Neighborhood Rough Set And Its Application

Posted on:2023-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2568306794457154Subject:Control engineering
Abstract/Summary:PDF Full Text Request
How to conduct attribute reduction of high-dimensional data without losing information,in order to reduce the cost of data transmission and storage,and decrease the difficulty of subsequent data mining,is an important topic in the field of data science.Rough set attribute reduction is widely used in attribute reduction of high-dimensional data,because there is no need for any prior knowledge besides the data itself.Neighborhood rough set is an extension of the traditional rough set model.Due to the introduction of the concepts of neighborhood granulation and granularity space,neighborhood rough set does not need to discretize the continuous data,which can reduce the information loss of the discretization for continuous data on the traditional rough set model,thus expanding the application scope of rough set theory.For the problems of neighborhood granulation strategy,measurement function design,attribute reduction under small sample data and so on,which exist on the existing neighborhood rough set attribute reduction algorithm,this paper puts forward targeted methods.The main works are summarized as follows:1.Aiming at the problem of effective information loss on the existing neighborhood rough set model,which is caused by ignoring the distribution characteristics of samples in the neighborhood,an attribute reduction algorithm based on weighted neighborhood dependency is proposed.The algorithm introduces the weighted neighborhood concept considering the distance distribution between samples in the neighborhood,and define the weighted neighborhood dependency function for heuristic attribute reduction.Taking 12 UCI public data sets and a corn seeds hyperspectral data set collected in the laboratory as the research objects,the proposed algorithm is used for attribute reduction,and three common classifiers are used to establish classification models to verify the classification ability.Compared with the two comparison algorithms,the results show that the average classification accuracy of the reduction results on the UCI data set is 84.37%,improved by 2.02%,and the average dimension is basically unchanged;On the hyperspectral data set of corn seeds,the average classification accuracy is 89.00%,improved by 0.26%,and the average dimension is 20.00,reduced by 6.50.Therefore,the proposed algorithm has good classification performance.2.Aiming at the problems of information loss during continuous data processing,inconsistent information introduced by granulation strategy,and difficulty in optimizing parameters on the existing neighborhood rough set attribute reduction algorithm,a continuous space attribute reduction algorithm based on distinguishment is proposed.The algorithm makes full use of the distribution information of the data itself,and directly defines the concepts of intra class discrimination and inter class discrimination from the perspective of improving the classification performance,so as to reflect the intra class consistency and the distinguishment between different classes,and then determine the optimal reduction set based on the reduction principle of minimizing intra class discrimination and maximizing inter class discrimination,in order to improve the classification performance of subsequent classifiers.Experiment results indicate that the average accuracy in classification of the reduction sets,which is obtained by the algorithm presented on 12 UCI data sets and a hyperspectral data set of corn seeds,is 85.85% and 89.21% respectively,and the average dimension is 5.00 and29.00 respectively,showing better classification performance compared with the six comparison algorithms.3.Aiming at the difficulty of accurately evaluating the importance of attributes,which is due to the difficulty of attribute reduction caused by the small incremental data set,an extended attribute reduction algorithm for incremental data is studied.The algorithm learns from the idea of neighborhood granulation in neighborhood rough set theory,and constructs the neighborhood relationship between historical data and incremental data,selects and expands the historical data based on the constructed neighborhood relationship to the incremental data set,so as to solve the problem of concept drift between different data sets.Attribute reduction based on the expanded data set improves the reduction performance of attribute reduction algorithm on small sample incremental data set.Experiments based on five attribute reduction algorithms show that the average classification accuracy of the proposed extended attribute reduction algorithm on 12 UCI data sets is 84.25%,and the average reduction scale is 5.80;On the hyperspectral incremental data set of corn seeds,the average classification accuracy is 89.48% and the average reduction dimension is 23.60.Therefore,the proposed algorithm has good classification performance.
Keywords/Search Tags:attribute reduction, classification accuracy, continuous data, neighborhood rough set, dependence
PDF Full Text Request
Related items