Font Size: a A A

Customer Default Prediction In Lendingclub Data Based On Classification Integration

Posted on:2021-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2370330626961116Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet.Internet financial products are developing rapidly in this era,and large lending platforms are gradually emerging.Lending Club is one of the fast-growing and well-operated large P2 P trading platforms.Due to the low trading threshold,simple process and high return on investment have attracted large volume customers to enter the market quickly and derived the illegal loans.In view of this,this paper uses Lending Club's loan data from October 1,2018 to December 31,2018 for modeling and analysing.The risk assessment through the integrated classification prediction method can improve the P2 P platform's ability to identify customers with high default rates,so the method can provide a scientific decision-making basis for platform and company.This paper mainly uses machine learning algorithms to evaluate and predict.First,oversampling unbalanced data by Synthetic Minority Oversampling Technique(SMOTE)method.Secondly,deriving and filtering data features,and on this basis,the grid search algorithm is used to optimize parameters such as Logistic Regression(LR),Random Forest(RF),eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM)and other models.Finally,the models with better classification effects are integrated,and the integrated results are output in three ways: Logistic Regression,majority voting,and probability average model.The comparison shows that the integrated model can identify customers more accurately.At the same time,the interpretable model SHapley Additive exPlanation(SHAP)is used to interpret and analyze the best model from local and global perspective and find out the important features that make customers default in the data.Therefore,the integrated model has higher accuracy to identify the customer's default behavior.
Keywords/Search Tags:Lending Club, Random Forest, XGBoost, LightGBM, Logistic Regression, SHapley Additive exPlanation, Grid search algorithm, unbalanced data processing, Stacking model
PDF Full Text Request
Related items