Research On Unbalanced Data Classification Based On Ensemble Learning

Posted on:2024-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:S J Li

Full Text:PDF

GTID:2568307085958809

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the emergence of the big data era,the issue of unbalanced data classification has become one of the hot research directions in the field of data mining.Real-world scenarios such as natural disaster prediction,financial risk assessment,and network intrusion detection are all problems of imbalanced data classification.However,imbalanced data sets present significant challenges to traditional classification algorithms,as class imbalance severely affects the accuracy of classification models,resulting in biased models performing poorly on minority classes.This study focuses on the imbalanced problem and mainly addresses it from two aspects: data preprocessing and optimization of classification algorithms.The specific research contents are as follows:This study focuses on the imbalanced problem and mainly addresses it from two aspects: data preprocessing and optimization of classification algorithms.The specific research contents are as follows:In terms of data preprocessing,in view of the shortcomings of traditional SMOTE and ADASYN oversampling algorithms,it is proposed to propose WSA(Weighted SMOTE-ADASYN)oversampling algorithm,which combines the advantages of SMOTE and ADASYN algorithms to over-sample a few types of data,so as to balance the unbalanced data set;First of all,select a minority sample point and calculate the K-nearest neighbor,and count the distribution of large class sample points and small class sample points around the small class sample points.Secondly,determine the unbalance loss of the unbalanced data set,calculate the number of sample points to be synthesized,and determine the position of the small sample through the ratio method.Finally,the weighted SMOTE method or ADASYN method is called to synthesize sample points according to the location of sub-class sample points.In terms of classification algorithms,in order to improve the classification accuracy of common ensemble algorithms,a weighted random forest algorithm(RRF)based on hierarchical sampling of Relief features is proposed.At first,the classification model applies the Relief algorithm to calculate the feature weights of every dataset,and then layers the dataset features according to their weights;The algorithm first calculates the feature weight of each dataset through the Relief algorithm,and then layers the dataset features according to the feature weight.Then,when the random forest algorithm uses Bootstrap sampling,samples are uniformly extracted from the layered features,Thereby reducing the interference of low correlation features on classification results;Then,based on the classification performance of a single decision tree in the algorithm,the decision tree is given weight to further improve the classification effect.Finally,the oversampling algorithm and classification algorithm proposed in this thesis are combined into the unbalanced data classification framework,and the experimental verification is carried out on the UCI imbalanced data collection.Through F-measure,AUC,G-mean and other indicators,it has been demonstrated through experimentation that the algorithm proposed in this thesis outperforms traditional oversampling algorithms and ensemble learning classification algorithms in processing imbalanced data.

Keywords/Search Tags:

Unbalanced data set, Integrated learning, WSA, Feature layerin

PDF Full Text Request

Related items

1	Research On Malicious Web Page Recognition Based On Hybrid Feature Selection And Subsampling Multilayer Integrated Learning
2	Research On Credit Risk Evaluation Under Unbalanced Data Set Based On Integrated Learning
3	Research And Application Of Integrated Algorithms For Unbalanced Data Sets
4	Application Research Of Unbalanced Data Classification Algorithm Based On Integrated Learning
5	Research On Employee Turnover Prediction Based On SMOTE-SVM Under Unbalanced Data
6	Research On Classification Algorithm For Unbalanced Data
7	Research On High-dimensional Unbalanced Data Classification Algorithm Based On Feature Selection And Ensemble Learning
8	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
9	Research On Speaker Recognition Based On Integrated Learning
10	Research On Federatedlearning Methods For Unbalanced Data