Font Size: a A A

Application Of Model Fusion In Monitoring Loan Default Risk

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:X D TanFull Text:PDF
GTID:2480306326953929Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of credit business,while the number and amount of loans have grown rapidly,accompanying problems and risks have also occurred.The risk of default is the main risk faced by my country's lending business.If the potential default users in the lending business can be accurately identified,then special treatment is carried out for this group of people,the occurrence of defaults can be avoided to a certain extent,and the bank's or the bad debt rate of P2 P enterprises,thereby improving the rationality of resource allocation and the utilization rate of social funds.For the research on loan default prediction,logistic regression models are traditionally used.In today's era of big data,the magnitude,dimension,and speed of data generation have made a qualitative leap compared to the past.The traditional loan default risk assessment model faces such a large amount of data,and its risk prevention and control capabilities have Certainly weakened.Facing the challenges brought by big data,it is of great practical significance to adopt more innovative models and informatization methods to assess credit risk.In recent years,decision tree algorithms,as well as machine learning algorithms such as SVM and Cat Boost derived from decision trees,have emerged one after another,all of which are more able to adapt to the challenges brought by massive data in the big data era.If the advantages of each model can be combined,more accurate prediction results can be achieved.In this paper,aiming at the characteristics of imbalanced loan default data,high dimensionality,and large magnitude,it adopts the method of model fusion to establish a two-layer strong learner to predict the risk of loan default.First of all,the data is preprocessed from the aspects of missing values,outliers,and the same value rate.Then,from the aspects of feature correlation,feature binning,IV value,etc.,feature engineering is carried out to remove noise.Then,for the problem of data imbalance,the SMOTE algorithm is used for oversampling to balance the number of positive and negative samples.Furthermore,for the processed balance data,several machine learning algorithms commonly used in two classifications are selected to establish a single model,and the loan default risk is evaluated based on indicators such as accuracy,recall,and ROC curve.Finally,Random Forest,XGBoost,and Light GBM,which have a better single model effect,are selected as the base model of the fusion model,and the three model fusion methods of Voting,Blending and Stacking are adopted to establish multiple two-layer learners to predict loan default.Comparing the above three single models and each fusion model,it is found that the fusion model can integrate the strengths of each single model,so that the evaluation indicators of the fusion model are optimized,especially the fusion model established by the Stacking model fusion method has the best effect.Finally,it is concluded that the advantages of each single model can be integrated through model fusion,and a model that can efficiently identify the risk of loan default is established,thereby reducing the risk of default and helping the lending industry to develop healthily and sustainably.
Keywords/Search Tags:Model fusion, loan default prediction, machine learning, feature engineering
PDF Full Text Request
Related items