Research On Personal Credit Scoring Based On The Boosting Algorithm

Posted on:2024-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:B E Yi

Full Text:PDF

GTID:2568307106986149

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Credit scores are widely used in various industries to determine the creditworthiness of customers.In the banking industry,credit scores are used to determine whether customers qualify for loans and credit cards,as well as to set the corresponding credit limits.In e-commerce platforms,credit scores also affect the limits customers can obtain.In addition,credit scores play an important role in the construction of the national credit system,such as China’s social credit system.Credit scoring can be a continuous variable prediction or a categorical variable classification,such as binary and ternary classification problems.Multiple linear discriminant analysis is a common method used to solve classification problems in the early stage.However,with the progress of research,more and more methods have emerged,such as logistic regression,early neural networks,support vector machines,decision trees,and various tree models based on decision trees,such as random forests.In recent years,the significant advancements in computer computing power and the explosive growth of data volume have led to a plethora of solutions to the credit scoring classification problem.Various machine learning and deep learning algorithms have shown remarkable performance in tackling classification problems on real datasets.This thesis takes the classical logistic regression model as the benchmark and compares it with the random forest model,the newer algorithms XGBoost and CatBoost.AUC,K-S value,PSI value,and balanced accuracy(BA)serve as the primary evaluation indicators.This thesis employs a publicly available dataset from Kaggle and implements the modeling process using R,with a training set and test set split ratio of 4:1.The modeling process and results are presented below.Firstly,a random forest model was established to classify the credit score dataset.Due to the sensitivity of the random forest model to parameters,a 10-fold cross-validation grid search was used to search for the parameters mtry and ntree.The search took 10088.3seconds,and the optimal parameters were found to be mtry=5 and ntree=1000.The model was then built using the optimal parameters and evaluated on the test set.The results showed an AUC of 0.8785,K-S of 0.63,PSI of 0.01,and BA of 0.8308.Compared to the logistic regression model,the AUC was improved by 11.08%.Secondly,an XGBoost model was established to classify the credit score dataset.A5-fold cross-validation grid search was used to search for parameters,and the optimal parameters found were nrounds=200,max＿depth=9,eta=0.05,min＿child＿weight=0.7,and subsamp=0.8.The search took 13248.26 seconds.The model was then built using the optimal parameters and evaluated on the test set.The results showed an AUC of0.897,K-S of 0.67,PSI of 0.03,and BA of 0.83,with an AUC improvement of 13.43%compared to the logistic regression model.Finally,a CatBoost model was established to classify the credit score dataset.A5-fold cross-validation grid search was used to search for parameters,and the optimal parameters found were depth=8,learningrate=0.05,and iterations=400.The model was then built using the optimal parameters and evaluated on the test set.The results showed an AUC of 0.8196,K-S of 0.61,PSI of 0.02,and BA of 0.8196,with an AUC improvement of 3.63% compared to the logistic regression model.The conclusion of the thesis was that when considering only the classification accuracy of the models,the XGBoost and random forest models had similar performance,with CatBoost being slightly worse and logistic regression being the worst.However,the XGBoost model had severe overfitting,and overall,the random forest model had the best performance.

Keywords/Search Tags:

Credit Scoring, Random Forest, XGBoost, CatBoost

PDF Full Text Request

Related items

1	Research On Personal Credit Default Prediction Based On XGBoost+RF
2	Design And Implementation Of Credit Rating System Using XGBoost Algorithm
3	Analysis Of Loan Default Prediction Based On Ensemble Learning Algorithm
4	Application Of Data Mining In Personal Credit Risk Identification Of P2P Online Loan
5	Credit Risk Assessment Based On Improved Random Forest
6	Application Research Of Credit Card User Credit Risk Prediction Model Based On CatBoost Algorithm
7	Research On Precision Delivery Of Consumer Vouchers Based On XGBoost And CatBoost
8	Research On Credit Scoring Model Based On Machine Learning
9	Research On Enterprise Credit Assessment Based On Model Fusion
10	Design And Analysis Of Personal Credit Scorecard Based On Logistic Regression