Font Size: a A A

Research On Personal Credit Scoring Model Based On SMOTE Oversampling Method

Posted on:2020-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:W J WangFull Text:PDF
GTID:2439330575960977Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the booming of Internet finance,the scale of the personal credit business and the areas involved are expanding.It is possible to break through the time and geographical restrictions,providing faster financial services for customers with financing needs on the Internet.But at the same time,there are also problems such as credit risk and user fraud.They require an accurate personal credit scoring model to improve the level of risk control.Based on the personal credit data provided by the financial institutions of the “Taiwan Futures Cup” statistical modeling competition,this paper improves the personal credit scoring model from the two aspects of balanced data structure and combination model,aiming to improve the risk control level of personal credit approval.Balanced data structure: The ratio of the number of “overdue” samples to the number of “not overdue” samples is 1:15,and there is a serious data imbalance.This Paper balances the two types of samples of “overdue” and “not overdue” to 1:1 by SMOTE oversampling of “overdue” samples.Logistic regression model,XGBoost decision tree model and XGB-Logistic combination model were established for the preequilibrium data.SMOTE-Logistic regression model,SMOTE-XGBoost decision tree model and XGB-SMOTE-Logistic combination model were established for the balanced data.Comparing the AUC indicators of the six models before and after the data balance,it is found that the AUC of the three models after data balance is significantly higher than that of the three corresponding models before data balance.from the accuracy rate,recall rate,and PR curve,the sample prediction ability of the three models after data balance is significantly improved.Therefore,it is concluded that the personal credit scoring model constructed under the data balance condition has better prediction effect.Combinatorial model: Based on the balanced data,comparing the differences of basic ideas,application conditions,prediction accuracy and interpretability of the mature personal credit scoring model method at home and abroad,it is found that single SMOTE-Logistic regression model has the strongest explanability,but the prediction accuracy AUC is not high;single SMOTE-XGBoost decision tree model has higher prediction accuracy AUC,but the interpretability is poor.Based on the above analysis,this paper regards the prediction result pred2 of the SMOTE-XGBoost decision tree model as the first impression impre2 for the customer,and the impre2 as the augmentation variable of the SMOTE-Logistic regression model,and establishes a new XGB-SMOTE-Logistic combination model.By evaluating the recall rate,accuracy rate,PR curve,AUC and interpretability of each model,it is concluded that the predictive accuracy of XGB-SMOTE-Logistic combination model is significantly higher than that of single SMOTE-Logistic regression model,and the interpretability is stronger than that of single SMOTE-XGBoost decision tree model.Therefore,the XGB-SMOTE-Logistic portfolio model is the best comprehensive performance model,and it is of great significance to financial institutions.
Keywords/Search Tags:credit score, SMOTE oversampling, SMOTE-Logistic, SMOTE-XGBoost, XGB-SMOTE-Logistic
PDF Full Text Request
Related items