Font Size: a A A

Research On Default Risk Identification Of Internet Car Loan Customers Based On Random Forest Model

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J X GongFull Text:PDF
GTID:2392330626962581Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era,many traditional industries in China are actively transforming.The auto finance market is no exception.In recent years,the Internet add auto finance model has also been launched.This form of car loan relying on the Internet platform not only simplifies the offline approval process,but also provides customers with personalized services by relying on the "people add car" database.However,this interconnected innovation model brings new opportunities to the car loan platform,and also brings certain problems and loan risks to the platform and funders.Among them,the major car loan platforms are most concerned about the identification of customers' default risks.On the one hand,the high-dimensional feature information submitted by car loan customers needs to be manually sorted into loan orders.But manual review is costly,inefficient and error-prone.With the continuous accumulation of time and customers,identifying default risk customers from massive data will undoubtedly increase the workload of relevant reviewers and hinder the efficiency of loan decision-makers.Therefore,this paper hopes to use data mining technology to seek the hidden laws and values behind the data,and conduct model training based on the existing data.Finally,a model that efficiently and accurately identifies default customers is applied to the car loan platform.In order to reduce the risk of lending on the car loan platform and the cost of manual screening,and improve the efficiency of lending decisions.In this paper,crawler technology is used to obtain the relevant data of the car loan platform,and the training and test sets are divided after data cleaning and preprocessing.Applying the idea of ensemble learning,the random forest model is mainly used to identify default customers,and the commonly used CART decision tree and Logistic regression are added for comparison and analysis.The model's performance comparison is performed through the model's confusion matrix and ROC curve.Finally,the following conclusions are reached: from the algorithm point of view,the random forest model performs best.Specifically,the second error rate of the random forest is lower than the CART decision tree model and Logistic regression.At the same time,the AUC value is higher than the latter two.The top ten rankings of the importance of the feature variables obtained through Mean Decrease Accuracy and Mean Decrease Gini are the loan amount,loan interest rate,working life,whether local account,loan Time limit,company nature,age,number of applications,credit report.And optimize the initial random forest through parameter tuning,and finally further improve the recognition accuracy of the model,and provide a certain reference and guidance significance for the Internet car loan platform to identify default customers in the future.Finally,the author summarizes and looks forward to the writing of this article.
Keywords/Search Tags:decision tree, random forest, logistic regression, internet car loan, default
PDF Full Text Request
Related items