Font Size: a A A

Reject Inference In Credit Scoring Based On Semi-supervised Learning

Posted on:2022-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuFull Text:PDF
GTID:2568306326973909Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
The sample bias issue in credit scoring has evoked increasing attention of both academicians and practitioners because lenders usually have only information of the approved loans but apply the credit scoring model to the whole applicant population,leading to bias accumulation after interations of credit scoring model,since the distribution of acceptances and the whole sample are totally different.Reject inference aims to reduce sample bias and improve model performance in credit scoring.Under different missing data mechanisms(missing not at random,MNAR),several statistical methods have been the traditional solutions to incorporate information from the rejected applicants and proved to have limited advantages.In this paper,inspired by Propensity Score Matching in casual inference and semisupervised Iterative Self-learning method,we propose the semi-supervised approach based on Matching(psmRI)and the Iterative Self-learning approach based on Matching and Iterative Semi-supervised Clustering(psmCSL)as two new reject inference techniques.First,we apply the logistic regression to obtain the default probability which is used for K-Nearest Neighbor with features to identify homogeneous acceptances and rejections and keep the part of the rejections matching with acceptances.The psmRI assign labels to part of the rejections according to the label of matched acceptances.Second,we cluster among the acceptances and keep the part of the rejections again by comparing their distance with the clustering center with the cluster’s radius.Third,the psmCSL adopt the selected rejections and original acceptances to build an Iterative Selflearning model.In simulation part we design different data generation rules to simulate varied data type in reality and missing mechainism to generate data with different sample size.Through 6 comparing experiments on 5 Reject Inference methods,the simulation results prove that(1)Reject Inference methods show their advantages in dealing with data with small sample size,which is suited for new loan products;(2)interative methods show advantages in AUC,accuracy and stability under all the data generation and missing mechainism;(3)using default probability in Propensity Score Matching outperform using the features;(4)biased testing dataset with only acceptances will underestimate the model performance with only a small influence.We test the performance of these two reject inference approaches in Logistic and XGBoost models based on data sets of real consumer loans from the lending agency of the US.The results show that in Logistic model,the psmRI based on default probability and features show advantages over other traditional rejection inference methods in AUC and Logistic models without inferring rejections in AUC,KS,accuracy and other measures.We also found that in XGBoost model,the psmRI based on default probability obtain the highest AUC and psmCSL model based on default probability and features can reach the highest accuracy and lowest type Ⅱ error,lowering the risk of accepting the loan with high default risk,but has limited advantages in AUC since the error accumulation during iteration.Furthermore,the two best models are proved to outperform the Benchmark model(modeling only on the acceptances)statisticly through the significance test.With our research,inferring the label of the reject from semi-supervised Matching and Iterative Self-learning is found to help mitigate the sample bias problem and improve the predictive accuracy,which is meaningful for building a more unbiased,accurate and stronger credit scoring model.
Keywords/Search Tags:Reject inference, Credit scoring, Semi-supervised Iterative Learning, Propensity Score Matching
PDF Full Text Request
Related items