Font Size: a A A

Feature Selection Of Unbalanced Data

Posted on:2021-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:C CaiFull Text:PDF
GTID:2427330602483984Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of modern communication 5G technology and various intelligent technologies,today's society has undoubtedly stepped into the era of informatized big data.Whether it is real life or virtual networks,more and more data is continuously flowing,as a result,the types of data are becoming more and more diverse.This not only brings sufficient support for machine learning,data mining and other fields,but also brings many challenges.In the era of big data,the field of data mining is often faced with high-dimensional data problems.At this time,it is necessary to use feature selection methods to filter out data with redundant information and noisy data.The classification problems encountered in life often have imbalanced data,such as face recognition,customer churn,email filtering,and text classification.Data imbalance classification problems and high-dimensional data problems often overlap,and data imbalance problems will make the process of selecting features more biased,so this paper focuses on the impact of data imbalance on feature selection and solutions..In this paper,based on the original relief feature selection algorithm,it is proposed to use kmeans-smote upsampling and then use relief to select features,use the python tool to verify the effect with the MUSK dataset in the uci database,and compare the two feature selections,performance on the three classification algorithms.In addition,this paper also explores the choice of three feature selection methods and three classification algorithms on imbalanced data sets.
Keywords/Search Tags:data imbalance, feature selection, data mining, machine learning
PDF Full Text Request
Related items