Font Size: a A A

Research On Credit Default Of Bank Customers Based On Data Mining

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:W Y LiuFull Text:PDF
GTID:2480306248455734Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the continuous expansion of consumer credit businesses such as bank mortgages,car loans,and personal small loans,banks are also facing the increasing credit risk.A major factor that causes the credit risk is customer credit default.It is very necessary for banks to identify such customers in a reasonable and scientific manner and prevent and control the potential risk customers in advance.In the information age,how to effectively research customer data through data mining technology and establish an efficient bank credit default assessment model will also become a new opportunity for banks to effectively control the credit risk.This article starts with the processing and analysis of the credit data of a bank customer on the well-known competition website kaggle.The empirical analysis mainly uses bank loan customers as default as a binary dependent variable,and also uses customer gender,marriage,loan amount,historical credit,etc.Variables as explanatory variables.After introducing dummy variables into the original data set,preprocessing the missing values and special values,an exploratory analysis of the data is performed to simply evaluate the degree to which each input variable affects the default result.Then the sample set is divided into a training set and a test set according to the ratio of 7: 3,and the three models of logistic regression,random forest and AdaBoost are learned and established on the training set.According to the confusion matrix,correct rate,AUC value and other classification model evaluation standards,the generalization ability of the three types of models on the test set is compared and evaluated.The confusion matrix and cross-validation results of the three types of models show that their models are all feasible.By integrating several types of evaluation indicators,the bank credit default model constructed by logistic regression has the best performance and the strongest interpretation of the results.Random forest Secondly,the classification performance of Adaboost model is relatively weak.
Keywords/Search Tags:data mining, customer default, logistic regression, random forest, AdaBoost
PDF Full Text Request
Related items