| Credit business is one of the core businesses of commercial banks and other financial institutions,and the intelligent risk control model is the main technical means of credit business and one of the main application scenarios of big data algorithms.However,in the process of establishing the risk control model,the phenomenon of sample imbalance is one of the common problems to be solved urgently.In this paper,a sample imbalance processing model based on the combination of mixed sampling strategy and cost-sensitive method is established.In the model,the proportion of positive and negative samples is adjusted based on the mixed sampling strategy,and then the model is trained according to the cost-sensitive method.This paper uses the Kaggle competition data set "Give Me Some Credit" for empirical analysis,and finally selects 10 variables to enter the modeling link through feature engineering.In the modeling process,the unbalanced credit dataset is first processed based on the upsampling strategy,the downsampling strategy and the mixed sampling strategy.Next,combine the balanced dataset with Linear SVC algorithm,Logistic regression algorithm,decision tree algorithm,and cost-sensitive Adacost algorithm to construct an overdue prediction model.Finally,use the model evaluation indicators: Recall value,AUC value,F1 value,G-mean value for comparative analysis.The results show that compared with other classification models,the overdue prediction model based on SMOTEENN and Adacost has enhanced the ability to identify defaulting customers.That is,the Recall value has been significantly improved,reaching 0.92,and the AUC value,F1 value,and G-mean value are all higher than 0.90. |