Feature Selection Of Unbalanced Data

Posted on:2021-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:C Cai

Full Text:PDF

GTID:2427330602483984

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern communication 5G technology and various intelligent technologies,today's society has undoubtedly stepped into the era of informatized big data.Whether it is real life or virtual networks,more and more data is continuously flowing,as a result,the types of data are becoming more and more diverse.This not only brings sufficient support for machine learning,data mining and other fields,but also brings many challenges.In the era of big data,the field of data mining is often faced with high-dimensional data problems.At this time,it is necessary to use feature selection methods to filter out data with redundant information and noisy data.The classification problems encountered in life often have imbalanced data,such as face recognition,customer churn,email filtering,and text classification.Data imbalance classification problems and high-dimensional data problems often overlap,and data imbalance problems will make the process of selecting features more biased,so this paper focuses on the impact of data imbalance on feature selection and solutions..In this paper,based on the original relief feature selection algorithm,it is proposed to use kmeans-smote upsampling and then use relief to select features,use the python tool to verify the effect with the MUSK dataset in the uci database,and compare the two feature selections,performance on the three classification algorithms.In addition,this paper also explores the choice of three feature selection methods and three classification algorithms on imbalanced data sets.

Keywords/Search Tags:

data imbalance, feature selection, data mining, machine learning

PDF Full Text Request

Related items

1	Research On Adopting Data Mining And Machine Learning In E-Learning Towards Personalization And Security
2	Mining Web-based Learning System Data To Detect Different Pattern Of The Student During Completing Course
3	Exploring Online Learner Behavior Based On Educational Data Mining And Machine Learning
4	Research And Application Of Performance Prediction Model Based On Data Mining
5	Research On College Students’ Academic Achievement Based On Trajectory Data Mining
6	Research On The Application Of Blue Ink Cloud-Based Data Analysis In The Personalized Learning Mode Of College Students
7	Design And Implementation Of College Student User Portrait System Based On Data Mining Technology
8	A Research On Feature Extraction And Feature Selection Of Programming Process For Programming Education
9	A Data Mining On Emotion Analysis And Knowledge Difficulty In Chinese MOOC Forum
10	Analysis And Evaluation Of Preference Selection For National College Entrance Examination Based On OLAP And Data Mining