Font Size: a A A

P2P Network Lending Borrowers Credit Risk Prediction Analysis

Posted on:2020-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:P XiaoFull Text:PDF
GTID:2439330578478880Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Nowadays,Internet finance has flourished,presenting a variety of business models and operational mechanisms.But the development of Internet finance has also caused problems such as credit risk and user fraud.P2 P online lending is a prominent representative of Internet finance.Therefore,it is urgent to establish a credit system scoring model to improve the control level of P2 P online lending.This is also of great significance to the sustainable and healthy development of Internet financial industry in the future.However,in real life,these natural complex data with multiple data sources,ultra-high dimension,sparse and other characteristics are far beyond the capacity of linear regression or logistic regression models,which poses a huge challenge to traditional wind control.With the gradual improvement of personal information and various behavioral data,the use of big data mining technology to predict individual future credit performance has increasingly become the mainstream method.How to make full use of large data and improve the level of wind control is the key to the transformation of traditional wind control into large data wind control.The specific work of this paper is as follows:1.Data acquisition and data set preprocessing.This paper preprocesses the desensitized personal basic information data and credit record data of some borrowers in the P2 P network lending industry,and completes data cleaning such as eliminating abnormal data and filling missing values.2.Characteristic engineering.In the preliminary preparatory work,we focus on the data characteristics,such as the derivation of characteristic variables,one-hot coding of qualitative variables,standardized processing of quantitative variables Minmax,etc.Then we make a descriptive statistical analysis of the relationship between overdue and default from the borrower's personal basic information and credit records.Complete the feature selection and final variables summary,and monitor the feature variables according to the macro environment,follow-up as the model parameters threshold adjustment reference object.3.Construction of integrated learning model.Random forest,GBDT,XGBoost and model stacking are constructed,and the feature importance maps are output,and the results of these models are compared and evaluated.4.Establish credit scoring model system.Choose the best GBDT and scorecard model to form a credit scoring model system.This paper draws the following conclusions: 1.Pre-processing of feature engineering to obtain user portraits of borrowers.2.By comparing Random Forest,GBDT,XGBoost and stacking,we find that the accuracy of the above models is above 85%,and the classification accuracy is better.3.The AUC values of the four models are all over 80%,which shows that the four integrated learning models are in good agreement with each other.The credit risk prediction of P2 P borrowers in this paper has good performance.4.This paper also builds a scorecard with the help of GBDT model,which has higher performance than the traditional scorecard model.Finally,According to the results of the research,it provides some risk management countermeasures for P2 P online lending platform and industry development.
Keywords/Search Tags:P2P online loan, credit risk, Random Forest, GBDT, XGBoost
PDF Full Text Request
Related items