Font Size: a A A

Comparative Verification Of Network Credit Default Recognition Under Different Classification Models

Posted on:2021-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2480306245481624Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the help of the Internet,online credit occupies an increasing market share in the global economic system,and the global P2 P online loan transaction volume also shows an exponential upward trend.On one side,they are connected to a large amount of private idle funds,and on the other side,they are connected to small and medium-sized enterprises or individuals with financial needs.To a certain extent,it made up for the lack of coverage in the traditional credit industry.However,the surge in business volume does not represent the maturity of lending platforms.How to conduct fast and accurate credit approval and acceptance under massive data has become the primary consideration of each platform.And from the perspective of online lending companies,constructing a reasonable risk assessment model to trace each transaction or pipeline is the key to whether a company can continue to operate for a long time.The problem of credit default identification is a typical two-category problem.This article takes the loan data from the first quarter of 2016 to the fourth quarter of 2018 of the American credit platform Lending Club as the research object.First,the data set is carefully preprocessed,and then the improved DB-MCSMOTE method is used to up-sample the data set in the case of a serious imbalance between the default sample and the normal sample in the data set,thus forming a balanced sample data set.Then based on the logistic regression,LightGBM and CatBoost algorithms,respectively use this loan data to analyze and train model,and use Bayesian optimization to adjust the parameters of these three models,so that a single model can achieve better results on the test set for the entire year of 2019 Classification effect.Finally,the three models are compared in terms of accuracy,stability and interpretability.The conclusion shows that the AUC of the oversampling data on the validation set is significantly higher than the original data,indicating that the oversampling data set can provide more classification information for the model.The integrated models have faster running time and better effects,which also makes their advantages in accuracy and interpretability make up for the defects in stability,and the overall performance of the CatBoost model is better than other models.At the same time,the time and resources occupied by the Bayesian optimization method in the model parameter adjustment process are significantly less than the traditional parameter adjustment method.
Keywords/Search Tags:Online credit lending, breach recognition, integrated learning, LightGBM, CatBoost
PDF Full Text Request
Related items