Font Size: a A A

Research On Credit Card Default Classification Based On Data Mining

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2558306917482034Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Credit card business is one of the main businesses of modern commercial banks.The widespread popularity of credit cards facilitates consumers’ shopping and also bring huge profits to commercial banks.At the same time,credit card default risks faced by banks have also increased,which has caused huge losses to many banks.Previous research on the classification of credit card defaults has mainly used traditional data mining models,the model measurements adopted are also flawed.This article builds a BP-AdaBoost classification model by studying the characteristics of credit card users,their repayment status in previous months,the amount of bills,and the amount repaid to predict whether credit card users will default in the next month.This article compare the prediction result of the model with the real value to verify the classification effect of the model.This model can reduce the risk of user default and strengthen the bank’s ability to identify credit card default users.This article analyzes the credit card default data on the UCI data set to address the issue of credit card user default classification.First,clean the data and perform exploratory statistical analysis.Preprocess the data through feature engineering,and use the principal component analysis method to reduce the dimensionality of the independent variables with correlation.Box-Cox normalization is performed on numerical independent variables in the data that are greater than 0 and do not satisfy the normal distribution.Secondly,establish classification models such as SVM and Random Forest on the data sets before and after preprocessing to verify the effect of data preprocessing and obtain the model with the best classification effect.Finally,a BP-AdaBoost classification model is established,the classification effect of the model is obtained on the pre-processed data set and compare with all previous models’ classification effects.In order to reduce the impact on a small number of default users to the classification index,the F1-Score value and AUC value(area under the ROC curve)of the default user are used to measure the classification effect of the model.The experiments showed that:(1)Feature engineering can effectively improve the discriminative effect of the model.Performance of the model on the preprocessed data set is significantly better than that on the original data set;(2)The best-performing classical classification model on this problem is Random Forest,which takes a short time and has an excellent discrimination effect;(3)The Fl-Score and AUC of the BP-AdaBoost model are higher than that of all previous models,and the classification effect is more excellent,which can effectively identify default users.
Keywords/Search Tags:Credit card default classification, feature engineering, data mining, F1-Score, BP-AdaBoost
PDF Full Text Request
Related items