Font Size: a A A

Research On Ensemble Learning Algorithm Of Classification Based On Cost-sensitive

Posted on:2022-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X F MeiFull Text:PDF
GTID:2507306557964339Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the continuous development of data mining,many fields build machine learning models to complete classification tasks.Ensemble learning algorithm can deeply mine the data information and be used in various scenarios.However,when ensemble algorithm encounters the classification of imbalanced data,the objective function assigns undifferentiated misclassification cost to samples,so that the minority samples are not paid enough attention and the predictions tend to be biased towards the majority.In order to improve the performance of ensemble learning on imbalanced data,the paper adopts cost-sensitive ideas to improve the algorithm.The cost-sensitive loss function of the objective function on model combine with parameter optimization algorithm,which deal with parameter optimization problem in ensemble learning.The main research contents are as follows:(1)For binary classification,the weighted cross-entropy loss is used for the XGBoost algorithm based on standard cross-entropy loss function and the two-stage grid search is applied to find the optimal parameter.The weighted cross-entropy as the objective function can give differentiated weights of samples for each category,and thus the process of model training can be improved and comprehensive metrics obtain a certain degree of promotion.(2)It is proposed to embed focal loss function into XGBoost because ensemble learning has trouble mining the information of difficult samples.Firstly,Modulation coefficient increases the contribution of difficult samples to the overall loss.Secondly,weight coefficient controls the imbalance of data,which takes into account the influence of data distribution on the objective function.Finally,the appropriate parameter optimization algorithm is used to find the optimal parameter combination on public data sets,and the superiority of the improved ensemble learning is proved through the experiments.(3)On the basis of cost-sensitive learning,new LightGBM based on weighted focal loss and bayesian optimization is proposed.The bayesian optimization algorithm uses the prior information to find next parameter combination.Iterative optimization of the prior function and the collection function can solve model tuning of cost-sensitive ensemble.It is found that LightGBM,which embeds focal loss and bayesian optimization,can significantly improve the comprehensive metrics and efficiently handle various binary classification tasks through the experiments.
Keywords/Search Tags:Cost-Sensitive, Ensemble Learning Algorithm, Focal Loss, Imbalanced Data, Bayesian Optimization
PDF Full Text Request
Related items