Font Size: a A A

Research On Credit Evaluation Model Based On Semisupervised Random Forest

Posted on:2022-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:P P ZhangFull Text:PDF
GTID:2480306521981509Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of China's market economy,credit consumption has become the dominant form of personal consumption,such as spending,borrowing,and white bars.Shopping before paying has become commonplace in young people's lives.The P2 P network lending platform has also developed rapidly under the promotion of the economy.However,in recent years,the continuous thunderstorms of the P2 P platform have gradually exposed industry risks.P2 P network lending has been criticized because of credit issues.In order to identify non-performing loan applicants and reduce the risk of loan defaults,credit rating models have become the most commonly used method in various financial institutions.However,since each financial institution only uses samples of accepted applicants for modeling,the inconsistency between model training samples and model application samples will lead to deviations in model parameter estimates.Therefore,the credit performance of rejected applicants needs to be inferred during the modeling process.Rejection of inference aims to reduce sample bias and improve model performance in credit scoring.This paper proposes a new rejection inference technique based on previous scholars' research.This technique is called a semi-supervised random forest based on editing.First,the random forest algorithm is used to infer the credit performance of the sample of rejected applicants.Secondly,use the KNN algorithm to "reducing noise" on the sample of rejected applicants based on rejection inference conditions.Finally,adjust the weight of each sample in the sample set obtained in the second step according to the confidence of each rejected applicant sample,and then repeat the above steps based on the adjusted sample set,until The sample set of rejected applicants will not change.The experimental results show that the performance of the credit evaluation model based on traditional rejection inference methods such as reweighting method and extrapolation method is worse than that of the credit evaluation model obtained by using only accepted applicants for modeling.Because,the sample of accepted applicants does not fully represent the sample of accepted applications,and the sample of rejected applicants obtained by incorrect identification affects the credit performance of the credit evaluation model.However,the performance of the credit evaluation model based on the semi-supervised random forest method mentioned in this article is significantly better than other rejection inference methods,indicating that rejection inference helps reduce sample bias,and the "noise reduction" operation further improves the model performance.Since the sample data volume of bad accounts and good accounts in the real loan data set is quite different,the problem of category imbalance will affect the performance of the credit evaluation model.Therefore,this article explores the impact of the different ratios of samples of good accounts and bad accounts on the prediction results of the credit evaluation model.The experimental results show that as the positive and negative ratios of samples gradually become more balanced,integrated learner models such as random forest and xgboost gradually exert their advantages.When the sample ratio of bad accounts to good accounts is 1:1 to 1:4,random forest and xgboost Such integrated learner models are significantly better than individual learner models such as logistic regression and support vector machines;when the sample ratio of bad accounts to good accounts is 1:1,the random forest and xgboost integrated learner models perform best.Although the model performance of Integrated learner models such as xgboost is better than Single learner models as logistic regression and support vector machines,the interpretability of the model is poor due to the complexity of the model.In the financial field,the requirements for the interpretability of the model are very high.At the end of this article,SHAP value is used to explain the xgboost model,making it as interpretable as a simple model.
Keywords/Search Tags:Credit score, reject inference, semi-supervised random forest, unbalanced categories, explanatory
PDF Full Text Request
Related items