Font Size: a A A

Prediction Of Loan Default Risk Based On Ensemble Learning

Posted on:2023-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:C J ChenFull Text:PDF
GTID:2568306617960219Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of domestic Internet technology and financial industry,some personal loan businesses have increased rapidly,such as personal housing loan,car purchase loan,Internet loan,etc.,which not only promote China’s economic development to a great extent,but also hide a series of risk problems.Among them,the risk credit assessment of loan applicants is a key link in the development of consumer credit business.However,loan institutions or commercial banks can only evaluate the repayment ability of the lender according to the personal information and income certificate provided by the lender,and it is difficult to carry out detailed audit and investigation on the personal credit investigation status.It is inevitable that there are bad behaviors of fraud,which may lead to default and bring a series of risks to business development.Using the loan record data set of a credit platform,this paper first carries out data cleaning and feature engineering processing,including preprocessing the missing value;Smote oversampling algorithm is used to solve the problem of unbalanced data categories;Considering the correlation between the sample characteristics and the target category variables,according to the category to which the sample belongs,the degree of uncertainty reduced by each feature to the category division is calculated by using the random forest model and mutual information method,so as to measure the importance of the feature,so as to filter the sample characteristics,eliminate the influence of noise and reduce the complexity of the model.Secondly,random forest,extremely randomized trees and improved variable random tree models are constructed to evaluate and predict credit default,and the results are compared with several single models.From the comprehensive results,the prediction effect of integrated tree model is obviously better than that of single prediction model,and the improved variable random tree model has achieved relatively good prediction effect.The improved variable random tree model proposed in this paper adopts the controllable combination of certainty and randomness selection to induce the generation mechanism of the underlying decision tree,and constructs a transition model from certainty to complete randomization,which can effectively increase the complementarity between the integrated models,so as to improve the prediction accuracy of the model.On the issue of credit default assessment,the construction speed and prediction effect of the model have been improved,which provides a new feasible scheme for the risk assessment of online loan platform and credit business in major commercial banks.
Keywords/Search Tags:Credit Evaluation, Ensemble Learning, Random Forest, Extremely Randomized Trees, Variable-Random Trees
PDF Full Text Request
Related items