Online Streaming Feature Selection Method For High-dimensional Unbalanced Data

Posted on:2024-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:K J Fan

Full Text:PDF

GTID:2568307064955689

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Feature selection is one of the most important methods in the feature preprocessing sta ge of data,which can reduce the time of the training model and optimize the learning result.However,the speed of data generation is much faster than in the past,so feature selection algorithms face serious challenges.(1)Data feature spaces tend to be higher dimensional,unknown,and evolutionary.In practical application scenarios,features often enter feature spaces over time,making it challenging to capture all features at once,requiring feature selection to have the ability to process flow characteristics;(2)Class imbalances often accompany data;(3)Data categories tend to have hierarchical relationships rather than independent relationships.Traditional feature selection is no longer able to handle such data well.In this thesis,the existing stream feature selection method is studied,and a new stream feature selection algorithm is proposed for unbalanced data.The main research contents are as follows:(1)Online flow feature selection for long-tail distribution data.Aiming at the existing feature selection algorithm that does not consider the problem of category imbalance,this thesis considers the distribution and characteristics of unbalanced data and the hierarchical relationship between data and reduces the imbalance of data with the help of brother relationships in the data.By defining the neighborhood rough set model,with the calculation of neighborhood dependence and importance,the features with high separability in rare class and normal class are selected,and the online stream feature selection algorithm is given.Through experimental results,it is proved that the proposed algorithm can better deal with the classification problem of category imbalance data.(2)Online flow feature selection based on hierarchical neighborhood r ough set.Many existing online stream feature selection algorithms require prior knowledge,and setting a uniform parameter in different neighborhoods is a problem.This thesis defines an adaptive neighborhood coarse model for hierarchical data.Secondly,the hierarchical dependence of features on markers is also designed,and the dynamic selection of important features is carried out by calculating the online redundancy of online importance.Finally,the effectiveness of the algorithm is proved by experiments.

Keywords/Search Tags:

Online streaming feature selection, Hierarchical classification, Neighborhood rough set, Long-tail data

PDF Full Text Request

Related items

1	Research On Online Streaming Feature Selection Algorithms
2	Research On Neighborhood Rough Set Model For Streaming Feature Selection
3	Online Hierarchical Feature Selection Algorithms With Streaming Features
4	Online Streaming Feature Selection Algorithms Of High-dimension And Class-imbalanced Data
5	Online Streaming Feature Selection Based On Adaptive Neighborhood Rough Set
6	Feature Selection Based On Neighborhood Rough Sets Method And Its Application
7	Online Streaming Feature Selection Method Based On Neighborhood Dependence
8	Research On Online Hierarchical Streaming Feature Selection Algorithm Based On Decision Error Rate
9	Feature Selection Of Information Systems Based On Neighborhood Toleranc Rough Sets
10	Online Learning Algorithms For Classification Of Streaming Data