Font Size: a A A

Research On Personal Credit Evaluation Based On Random Forest Model

Posted on:2021-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:J HeFull Text:PDF
GTID:2480306461973829Subject:Business Statistics
Abstract/Summary:PDF Full Text Request
Since entering the "Internet +" era,credit consumption has entered the human life as a new lifestyle.According to relevant data,more and more residents are beginning to use loans for consumption.At the current stage,C hina is in a stage where cash is the dominant position,defaults and frauds are emerging endlessly.Judging whether the borrower is reliable,whether a default will occur and how to choose the optimal classifier model is a very important and difficult task.The research purpose of this article is to analyze the indicators that affect personal credit when customers make loans from financial institutions such as P2 P platforms,establish a personal credit risk indicator system,and model according to the corresponding indicator system to classify the borrowers.And finally hope that we can analyze which classifier model is more efficient and applicable to personal credit risk assessment.This article takes personal credit risk assessment as the research object.Firstly,the author reviews the literature and draws on domestic and foreign personal credit assessment index systems to establish the index system of this article.Secondly,the related theories of random forests and unbalanced datasets are elaborated in detail,and the advantages of random forests,the scope of application,and the causes and solutions of unbalanced datasets are summarized.Putting forward the view of using the F value,G value,AUC value,and precision change rate to measure the accuracy and stability of the model based on the imbalanced dataset;then based on the real dataset of the fourth quarter of 2018 about Lending C lub platform,after cleaning,transforming,filtering feature variables,and dividing the dataset,a random forest model based on the imbalanced dataset is established.At the same time,the SMO TE+ENN hybrid sampling algorithm was introduced to improve the original imbalanced dataset,and a random forest model based on the improved dataset is established.The comparison and analysis of the two results show that after the mixed sampling and balanced processing of the original dataset,the prediction accuracy and stability of the random forest model are improved.Finally,a comprehensive comparative analysis of the random forest model,Logistic regression model,and support vector machine model is applied to the individual risk assessment results on the balanced dataset.Multiple empirical analysis shows that t he prediction accuracy of the random forest model is significantly higher than the other two models.Stability lies between the other two models.The weighting of accuracy ranking(measured by the F value,G value,AUC value)and stability ranking(measured by the rate of change of accuracy),combined with comprehensive evaluation,found that the classification performance of random forest is better than the other two models.The research results show that there are many indicators affecting personal credit risk assessment,and the dataset has imbalance problems.Therefore,when evaluating the customer's personal credit risk,the balanced processing of the dataset can significantly improve the prediction accuracy and robustness of the classifier model.At the same time,when selecting the classifier model,a comprehensive comparative analysis of the random forest model,Logisic regression model and support vector machine model,based on the balanced dataset,concluded that the classification performance of the random forest model is better than the other two models.This can fully prove the efficiency and applicability of applying the random forest model to personal credit risk assessment.
Keywords/Search Tags:personal credit risk assessment, SMOTE+ENN mixed sampling, random forest model, logistic regression model, support vector machine model
PDF Full Text Request
Related items