| One of the mainstream transaction methods for Chinese residents today is credit delivery.Personal credit loans can not only solve the problem of personal capital turnover,but also the core of the profitability of my country’s banking industry in recent years.With the continuous expansion of the personal loan business,we found that the frequency of default events is on the rise.Therefore,we need to accurately and efficiently identify potential default risks,and improve the personal loan default prediction model,which will help accelerate the healthy development of the banking industry in the field of personal credit.In previous research on financial credit risk control,in order to identify defaulting users,researchers usually use traditional machine learning models to classify and predict defaulting users,and do not use or only use conventional feature screening methods in the modeling process.Therefore,this paper focuses on the construction of the classification model of the feature selection and the unbalanced data.Data originated in the 2021 CCF competition.After performing data pretreatment,we design a combined FIRE-ENET feature selection method.In the first step,we need to combine the random forest feature importance screening method and the recursive feature elimination method,and the second step uses the elastic net method to further eliminate redundant features.Compared with the characteristic selection of the characteristics and the characteristics commonly used in financial turmoil and the information gain method,the combined feature selection method has 1.7% to 2.6% in the classification accuracy,and the F1 value has 18.7 % to 33.2%increase.Secondly,due to the imbalance of data set,we build balanced sub-samples in four ways,SMOTE,Border Line-SMOTE,and SMOTE-Tomek.Finally,we are based on logical regression and random forests,XGBoost,Light GBM and Catboost four tree integration models,respectively,for four balanced sub-samples,and use Bayesian to tune parameters.Experiments show that "Smote-Tomek + Catboost" is the optimal combination model,the AUC value is 0.880,the F1 value is 0.589,and the recall rate can reach 0.885.And based on the model calculates the SHAP value,it is sorted,and the key factors affecting the situation of default are analyzed.In order to further increase the F1 value,we establish a ST-Stacking default predictive model consisting of SMOTE-Tomek sampling method and double-layer Stacking model.Experiments show that the model can greatly increase the F1 value and reach 0.903.In order to further verify the classification capabilities of the model in reality,the classification capacity may face when the sample set,the original imbalance is performed again,and the conclusion shows that after using the SMOTE-TOMEK sampling method,the accuracy of the Stacking model and F1 value is improved.The ST-Stacking breach predictive model based on FIREF-ENET feature option is an ideal loan default prediction model. |