Application Of Hybrid Xgboost Model In Unbalanced Dataset Classification Predication

Posted on:2019-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:L S Cui

Full Text:PDF

GTID:2348330569489328

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Classification problems are often encountered in daily decision-making.The premise of the traditional classification algorithm is assumed that the dataset is balanced or the cost of misclassification of each class is the same,but the datasets faced in reality are generally unbalanced,especially in the fields of medical diagnosis,commodity recommendation,etc.Studying the classification algorithms of unbalanced datasets is very valuable for solving practical problems.In this paper,firstly,through literature analysis,the existing solutions to the classification problem of unbalanced datasets at the dataset level and classification algorithm level are introduced in detail.Hybrid XGBoost model,a method of combining the re-sampling algorithm and the XGBoost algorithm,is proposed to deal with the binary classification problem of unbalanced datasets,and the model is applied to the prediction of user’s commodity preferences.This paper selects 31 feature variables from four aspects when constructing a prediction model of user’s product preferences,and predict whether the user will purchase the recommended product B by establishing a logistic regression model,a hybrid logistic regression model,an AdaBoost model,a random forest model,an XGBoost model,and a Hybrid XGBoost model.Using Recall,F1,AUC value and other indicators for comparative analysis,the results show that the EasyEnsemble-XGB model has the best prediction effect.Through the analysis of the fearture importance of the EasyEnsemble-XGB model,five important features are derived.This information can better portray the target user.In the practical application of the unbalanced dataset classification model,this paper proposes to adjust the threshold according to the actual business goal to output the classification label instead of using 0.5 as the classification threshold.

Keywords/Search Tags:

imbalnced datasets, binary classification, resample algorithms, Hybrid XGBoost model

PDF Full Text Request

Related items

1	Categories Of Unbalanced Data Integration Classification Research
2	Research On Potential Home Broadband User Identification Problem With Large Scale Imbalanced Datasets
3	Credit Card Fraud Risk Detection Based On Hybrid Model Model Study
4	Unbalanced Data Classification Based On Resampling And Hybrid Ensemble
5	Research On The Detection Model Of Credit Card Transaction Fraud Based On GAN-XGBoost
6	Based On The Hybrid Model Of XGBoost Applied Research In Stock Forecasting
7	Research On Video Moving Object Tracking Algorithms Based On Binary Classification
8	Research On Hybrid Recommendation Algorithm Based On XGBoost And SVD
9	Algorithms and analysis for multi-category classification
10	Empirical Reasearch On ESG Multi-Factor Model Based On XGBoost And Other Boosting Algorithms