Font Size: a A A

Research On The Prediction Of Personal Credit Loan Default Risk Based On Ensemble Learning

Posted on:2023-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:C Y SunFull Text:PDF
GTID:2530306842471784Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of China’s economy,the lending business of various commercial banks has begun to develop and grow.However,the existence of some users who don’t consider their own repayment ability or maliciously fake loans can make the loans lent by commercial banks uncollectible,resulting in reduction of revenue or even losses for the commercial banks.Therefore,it is very important for the future development of commercial banks’ credit business to accurately identify potential defaulters and reduce the nonperforming loan rate of commercial banks,while ensuring that the loss of "good users" is not so high.In view of this,this paper uses the loan data from credit users of several commercial banks and the idea of ensemble learning to establish an optimal model that can be applied to strict loan default risk control scenarios.The main research work is as follows:(1)Firstly,analyzing the group portrait of credit users by descriptive statistics,and exploring the distribution of missing values in the original dataset and filling them.According to the meaning of features and the relationship between features to derive them,and using the feature selection method of LightGBM-RFECV to screen the features and screening out 80 variables with feature priority of ranking first as the optimal feature subset for the next modeling step.(2)Secondly,for the imbalanced data in this paper,using seven unbalanced data processing methods based on data level to process the training set respectively,and then combining four classifiers including Logistic Regression,Random Forest,XGBoost and LightGBM to make predictions.After using each index to evaluate the advantages and disadvantages of each model,it is concluded that the combination of RENN and the four models have the best comprehensive performance.(3)Next,inputting the data processed with RENN undersampling into a integrated model with Random Forest,XGBoost and LightGBM as the first layer primary learner and Logistic Regression as the second layer Secondary learner for Stacking model fusion prediction.The results show that the RENN-Stacking model improves the single model’s Recall,F2-score and AUC values and reduces the rate of non-performing loan users which are considered important in this paper.So the RENN-Stacking model can identify 91.4%of defaulters and minimize the rate of non-performing loan users,while ensuring that the loss of "good users" is not so high.Based on this optimal model,the default risk of personal credit loans of commercial banks can be greatly reduced.(4)Finally,the user characteristics that should be paid more attention when approving loans are selected by ranking the importance of LightGBM characteristics,which improves the efficiency of loan approval.
Keywords/Search Tags:Loan default, Imbalanced data, Ensemble learning, Stacking model fusion
PDF Full Text Request
Related items