| Feature selection is one of the most important methods in the feature preprocessing sta ge of data,which can reduce the time of the training model and optimize the learning result.However,the speed of data generation is much faster than in the past,so feature selection algorithms face serious challenges.(1)Data feature spaces tend to be higher dimensional,unknown,and evolutionary.In practical application scenarios,features often enter feature spaces over time,making it challenging to capture all features at once,requiring feature selection to have the ability to process flow characteristics;(2)Class imbalances often accompany data;(3)Data categories tend to have hierarchical relationships rather than independent relationships.Traditional feature selection is no longer able to handle such data well.In this thesis,the existing stream feature selection method is studied,and a new stream feature selection algorithm is proposed for unbalanced data.The main research contents are as follows:(1)Online flow feature selection for long-tail distribution data.Aiming at the existing feature selection algorithm that does not consider the problem of category imbalance,this thesis considers the distribution and characteristics of unbalanced data and the hierarchical relationship between data and reduces the imbalance of data with the help of brother relationships in the data.By defining the neighborhood rough set model,with the calculation of neighborhood dependence and importance,the features with high separability in rare class and normal class are selected,and the online stream feature selection algorithm is given.Through experimental results,it is proved that the proposed algorithm can better deal with the classification problem of category imbalance data.(2)Online flow feature selection based on hierarchical neighborhood r ough set.Many existing online stream feature selection algorithms require prior knowledge,and setting a uniform parameter in different neighborhoods is a problem.This thesis defines an adaptive neighborhood coarse model for hierarchical data.Secondly,the hierarchical dependence of features on markers is also designed,and the dynamic selection of important features is carried out by calculating the online redundancy of online importance.Finally,the effectiveness of the algorithm is proved by experiments. |