Research On Ensemble Learning Algorithm Of Classification Based On Cost-sensitive

Posted on:2022-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:X F Mei

Full Text:PDF

GTID:2507306557964339

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

With the continuous development of data mining,many fields build machine learning models to complete classification tasks.Ensemble learning algorithm can deeply mine the data information and be used in various scenarios.However,when ensemble algorithm encounters the classification of imbalanced data,the objective function assigns undifferentiated misclassification cost to samples,so that the minority samples are not paid enough attention and the predictions tend to be biased towards the majority.In order to improve the performance of ensemble learning on imbalanced data,the paper adopts cost-sensitive ideas to improve the algorithm.The cost-sensitive loss function of the objective function on model combine with parameter optimization algorithm,which deal with parameter optimization problem in ensemble learning.The main research contents are as follows:(1)For binary classification,the weighted cross-entropy loss is used for the XGBoost algorithm based on standard cross-entropy loss function and the two-stage grid search is applied to find the optimal parameter.The weighted cross-entropy as the objective function can give differentiated weights of samples for each category,and thus the process of model training can be improved and comprehensive metrics obtain a certain degree of promotion.(2)It is proposed to embed focal loss function into XGBoost because ensemble learning has trouble mining the information of difficult samples.Firstly,Modulation coefficient increases the contribution of difficult samples to the overall loss.Secondly,weight coefficient controls the imbalance of data,which takes into account the influence of data distribution on the objective function.Finally,the appropriate parameter optimization algorithm is used to find the optimal parameter combination on public data sets,and the superiority of the improved ensemble learning is proved through the experiments.(3)On the basis of cost-sensitive learning,new LightGBM based on weighted focal loss and bayesian optimization is proposed.The bayesian optimization algorithm uses the prior information to find next parameter combination.Iterative optimization of the prior function and the collection function can solve model tuning of cost-sensitive ensemble.It is found that LightGBM,which embeds focal loss and bayesian optimization,can significantly improve the comprehensive metrics and efficiently handle various binary classification tasks through the experiments.

Keywords/Search Tags:

Cost-Sensitive, Ensemble Learning Algorithm, Focal Loss, Imbalanced Data, Bayesian Optimization

PDF Full Text Request

Related items

1	Empirical Research On Imbalanced Classification Based On Cost-Sensitive
2	The Study Of Multi-classification Cost-sensitive Method For Poor Students Data In Guangxi
3	Research On Negative Comments Of Online Courses For Unbalanced Data
4	Research On The Classification Problem Of Imbalanced Dat
5	Research On The Application Of Improved CatBoost Algorithm In Unbalanced Classification
6	Research On Multi-classification Ensemble Algorithm Based On Stochastic Configuration Network
7	Research On Classification Of Imbalanced Datasets Based On Random Forest
8	Research On Feature Selection Method For Software Defect Prediction
9	Research On Unbalanced Data Classification Based On Ensemble Learning
10	Research On Classification Method Of Poor Students Based On Cost-sensitive