Font Size: a A A

Research On Credit Default Prediction Based On Stacking

Posted on:2024-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2530307079491504Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Personal credit business is the main business of banks and various financial institutions.With the development of the Internet,credit business has gradually become the core sector of internet financial companies.However,while personal credit business brings profits to enterprises,it also comes with huge default risks.How to better apply new technologies such as machine learning in the field of credit,achieve risk control,and effectively improve returns has become the focus of attention for traditional financial institutions and internet financial companies.This article constructs a fusion model based on Stacking to predict credit default risks.It compares the traditional prediction model Logistic regression with the most widely used boosting family of ensemble learning algorithms XGBoost,Light GBM,and Cat Boost,and explores the Stacking fusion model to find the best credit risk prediction method,thus achieving reasonable control of default risks.In addition,this paper introduces Profit Rate,a new feature that can measure the benefits that the model can deliver,to further help companies quantify their earnings.In the research process,the raw data is first explored and preprocessed,including descriptive statistics of the data,distribution and visualization of variables,identification and treatment of outliers and missing values,and feature selection.Secondly,the processed data is applied to Logistic regression,XGBoost,Light GBM,and Cat Boost separately to establish single models using default parameters,and then optimized by parameter tuning.Finally,the Stacking fusion method is used to construct a mixed model,using XGBoost,Light GBM,and Cat Boost as the first layer of Stacking,and Logistic regression as the second layer,with the output of the first layer as the input of the second layer,to obtain an efficient fusion model.Based on the above method,the same dataset is applied to the Stacking fusion model and other single models,using AUC as the measurement standard for model performance,to predict credit defaults and compare the effects of the five models.The results show that the ensemble learning models of the boosting family are far better than the traditional Logistic regression model,and the Stacking fusion model significantly improves the prediction of credit defaults compared to single models.This article hopes to select better risk prediction methods from a practical perspective,help credit companies improve risk management capabilities,and provide new ideas for improving the performance of machine learning models.
Keywords/Search Tags:Default prediction, machine learning, ensemble algorithm, model fusion
PDF Full Text Request
Related items