With the rapid development of information technology,a large amount of data is collected and stored in the database,which undoubtedly brings many challenges to data analysis and knowledge extraction.Overall,the data usually contains some redundant information,which can lead to system performance degradation.Therefore,removing this information from the data without reducing the effective information that the data itself has plays a crucial role.To overcome this limitation,a very effective method has been applied to many areas,such as machine learning,data mining,and pattern recognition,i.e.,feature selection.This method is a process of selecting some of the most effective features from the raw features to reduce the dimension of the dataset.It is an important means to improve the performance of learning algorithms and a key data preprocessing step.However,firstly,the generation of modern data is separated from the analysis process,which makes the form and structure of data more and more complex.Secondly,modern data has the unique connotation of a dense scientific paradigm,which is typically characterized by a high dimension.This leads to some limitations of existing feature selection methods in dealing with modern data.To overcome the limitations of existing algorithms,this thesis introduces strategies such as information granulation into feature selection from the perspective of granular computing.Firstly,we apply the granular ball as an adaptive granulation method to feature selection.Secondly,MGRS models are further characterized by a variety of granulation methods from multi-view.Then,a new feature selection framework is constructed by introducing a granularity filter.Finally,a new concept of granularity over specific-class is proposed,which improves the generalization performance of the learner,the time efficiency of problem solving,and the stability of feature selection results.Specifically,the contents and innovations of this study mainly cover the following four points:1.Design a fast feature selection method based on granular ball rough setThe traditional neighborhood rough set needs to specify a radius or find a radius suitable for solving the problem by searching,which will result in enormous time consumption in data preprocessing.The granular ball rough set method can generate appropriate granular structure adaptively according to the data distribution.Based on the purity of granular balls as the criterion,the granular ball rough set method also introduces new ideas for the study of feature selection.When solving reduction using forward greedy search,it is necessary to try to calculate the change in granular purity caused by each candidate feature being added to the pool of selected features,which poses a serious challenge to the efficiency of the algorithm.To solve this problem,a feature partitioning strategy is proposed in the forward greedy search process,which essentially divides all features into different groups,so that the search space of candidate features can be compressed to achieve the purpose of fast feature selection.2.Building Triple-G MGRS model based on multi-viewUnlike classical rough sets,multigranulation rough sets(MGRS)usually use multiple results of information granulation to approach the target.At present,although many forms of MGRS have been studied in depth,most of them are based on homogeneous information particles of different scales or levels.They lack the results of granulation of heterogeneous information from multi-perspectives.In view of this,we construct a new multigranulation rough set model,i.e.,Triple-G MGRS.This model reflects the basic principle of heterogeneous information granulation,that is,describing the target concept by using both parameterized information granulation and data adaptive information granulation.From the perspective of generalization performance,the result of feature selection obtained by the Triple-G MGRS is better than previous studies.3.A GLEE model based on granularity filter is presented.In the traditional feature selection framework,considering the need to evaluate all the features in turn,select the appropriate features to add to the feature subset.Therefore,the time consumption of this method is unacceptable when the amount of data is too large.At the same time,the result of feature selection obtained by this method is unstable when the data is disturbed.For this reason,an effective feature selection framework,GLEE,is presented,i.e.,a Granularity fi Lter for f Eature s Election.Specifically,GLEE constructs granularity to effectively and quantitatively characterize each feature.Then,it filters out some of the inappropriate features.This model can not only be applied to different algorithms,but also make the algorithm with GLEE have quicker learning ability and higher stability.4.A GIFT model based on granularity over specific-class is presented.As the basis of Gr C,information granulation provides a new idea for feature selection.Although the application of information granulation technology in feature selection has achieved fruitful results,its potential in feature evaluation has not been paid enough attention.In view of this,we propose a new granularity over specific-class from the perspective of information granulation.Essentially,this granularity is based on the fusion of intra-class and extra-class granularity,which can better characterize the discriminatory ability of features.On this basis,GIFT(Granularity over spec Ific-class for Feature selec Tion)is further proposed,which enables it to obtain more stable and quicker results of satisfying feature selection without significantly reducing the classifying ability of the learner. |