Font Size: a A A

The Research On P2P Credit Default Identification Based On Data Unbalanced Perspective

Posted on:2021-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2439330623959007Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the popularization of the concept of inclusive finance,the P2 P lending industry has ushered in the industry's first climax through the “Dong feng” of inclusive finance.P2 P online lending can fully leverage its individual lending advantage.Quickly provide financial services to small and medium-sized enterprises and low-income individuals,so they are madly sought after by capital.It is precisely because of the influx of capital that the P2 P industry has ushered in a series of risk explosions after the barbaric growth.For the specific online loan platform,among the many factors,the user's malicious default is the biggest pain point.Whether it can effectively reduce the high default rate and improve the recognition accuracy of malicious default is an important work content of the platform to prevent risks.In view of the high default rate phenomenon in the P2 P lending industry,this paper attempts to study the default problem based on the perspective of data imbalance.Firstly,it introduces the related concepts of P2 P lending business and the theory of data mining model,and focuses on the basic ideas of the improved SMOTE algorithm.From the perspective of triangle region and coefficient,a new mechanism for artificially synthesizing minority is designed.Secondly,it introduces that the measure of calculating the importance of variables in rough set theory improves the original mechanism of relying on the random forest to obtain the importance,and helps the SMOTE algorithm to synthesize better minority samples.Then,it takes the Lending Club data set as an example for statistical analysis and Data preprocessing;next,it establishs Logistic regression model,random forest model and XGBoost model,and select the accuracy,recall,F-value,G-mean and other evaluation indicators to evaluate and compare the performance of the model;finally,from each The best performing RST_new_smt algorithm is selected in the model,enter the model assembling phase,and the final default recognition model is established by using the linear weighting method based on recall rate.The results of this paper show that the improved SMOTE algorithm proposed in this paper helps the model to identify default users,and the introduced rough set theory helps to select reasonable and important variables and indirectly improve the recognition accuracy of models,which indicates that the way of solving default problem based on the perspective of data imbalance is feasible.Based on the Logistic regression model,the random forest model and the XGBoost model,the model fusion work is carried out.The accuracy of the P2 P loan default recognition after fusion has better effects than the single model.In addition,this paper suggests that the P2 P lending platform should be as complete,clean and easy to distinguish as possible for the training data set,and use multiple models to develop default identification work.
Keywords/Search Tags:P2P, default identification, SMOTE algorithm, rough set theory, XGBoost, model assembling
PDF Full Text Request
Related items