Font Size: a A A

Application Of Data Mining In Internet Financial Risk Model

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:J C GaoFull Text:PDF
GTID:2518306740479344Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of computer technology,the Internet finance rapidly rise,meanwhile,a large number of risk problems have emerged,among which the risk of credit default is particularly important.Therefore,how to use data mining technology to extract representative information from massive and complicated user information and build an effective risk models is the key to the vigorous and healthy development of Internet finance.This article takes the user loan data published by a domestic Internet platform as a research object,using random forests,correlation coefficients and other methods for feature selection,selecting important features from them,and constructing single models(such as logistic regression model,BP neural network model)and integrated models(such as random forest model,Xgboost model,Lightgbm model)and Stackingc strategy fusion model to predict default users,meanwhile,using the Bayesian optimization algorithm to tune the model hyperparameters.In addition,in view of the common sample distribution imbalance problem in financial data,the original data is balanced by using upsampling(based on weighted Smote algorithm)and downsampling(based on Gaussian hybrid clustering algorithm).Finally,the model fitting results of the balanced data and the model fitting results of the original data are comprehensively compared with multiple evaluation indicators.The analysis results show that the integrated model has better predictive ability than the single models,and the stacking strategy can further improve the fitting effect of the integrated models; By balancing the original data set,the fitting ability of the model can be further improved,especially in terms of the recall rate of default users; The upsampling based on weighted Smote algorithm is suitable for single models,and the downsampling based on Gaussian hybrid clustering algorithm is more suitable for integrated models.
Keywords/Search Tags:Internet finance, Xgboost, Lightgbm, Bayesian optimization algorithm, Unbalanced data processing
PDF Full Text Request
Related items