Font Size: a A A

The Research On P2P Loan Default Risk Identification Model Based On Data Mining Technology

Posted on:2019-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WangFull Text:PDF
GTID:2359330542981743Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of Internet economy today,in order to meet the varied demand in debit and credit,P2P lending business under the Internet financial industry has shown a rapid expansion trend.However,in recent years,the development of P2P lending business has encountered a series of problems,especially the bad debts caused by higher default rates,which has caused losses to the P2P platforms and investors.The establishment of an effective P2P default risk identification model is of great significance to the risk control of the platform and the long-term development of the industry.This paper aims to use data mining methods to find the model can effectively identify the default risk,so as to realize the timely prediction of the potential default condition,help P2P business platforms to reduce losses caused by breach of contract,and improve their survival ability.Firstly,this paper showed a basic analysis of the P2P lending business,and describes several types of thoughts and characteristics of data the mining model which will be used.Then,the data cleaning and feature engineering to the P2P debit record collected data is carried out.After that,data mining models including Logistic Regression,Neural network,SVM,C50 Decision Tree,Random Forest,GBDT and XGBoost were set up for the data set after cleaning,and some appropriate evaluation indexes such as Accuracy,Precision,Recall and FI value were selected to compare the prediction effects between the models.At last,the XGBoost model was selected as the basic model and Logistic Regression model was selected as the second stage model,with Stacking method,the final combination model of default identification was established.The findings of this study show that basing on multi dimension data including the borrowers’ basic information,education information,network behavior,social networks and third party data,using the popular data mining XGBoost model to build Stacking model combination by the open source tools such as R language,which can achieve better prediction than single prediction model in recognition of default risk in P2P lending.When it refers to the performance in the test data,this model can not only identify a considerable proportion of the default borrowers,but also can avoid excessive ’manslaughter’ normal borrowers.Therefore,the results of this study can effectively assist the P2P platform to predict and identify potential default risk in a timely manner,help to protect investors’ legitimate benefits,and promote the supervision to the P2P industry.
Keywords/Search Tags:Data Mining, P2P Loan Default, Data Cleaning, Feature Engineering, XGBoost, Stacking
PDF Full Text Request
Related items