Font Size: a A A

Research On Default Warning Of Personal Network Credit Based On Data Mining

Posted on:2019-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y XueFull Text:PDF
GTID:2429330548962500Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
P2P network lending,as a new type of inclusive financial model,originated in Europe and the United States.In the early stages of development,due to the lack of actual data,scholars focused on the introduction of financial services and platform operation mode.Until the US P2 P platform Prosper foreign open platform trading data,the academic community has rich research resources.They began to focus on the most important risk problem facing P2P: credit default.The P2 P industry in China started relatively late and still needs to learn from the development experience of European and American countries.At present,the research of P2 P network lending in China is still in a shallow level,mainly focusing on the operating mode of the P2 P platform,the influencing factors of credit default and industry supervision.There are few research on the establishment of personal credit default early warning model through data mining.This paper uses the website data of the US P2 P market which has already entered the stable development period to carry out an empirical study in order to provide some reference for how to effectively avoid the credit default of the investor and the P2 P platform.The main purpose of this study is to predict whether a loan will be defaulted or not.The problem of personal credit default is regarded as a two classification problem,and an early warning model is established in the random forest,a algorithm based on the bagging algorithm of integrated learning,and the CART algorithm is used as the base learner.The data used come from the 2005-2014 year real transaction data provided by Prosper,which contains 113,937 instances and 81 attributes.Because this study is concerned with the difference between the completed loans and default loans,loan status is current,charge off and cancel loan is deleted,and the data of the final use of the centralized default loan is about 10.8%.After analyzing the reasons for the formation of personal credit default,this paper analyzes the factors affecting the personal credit default from the four aspects,including the basic information of the borrower,the borrower's economic information,the borrower's credit history and the loan information.The results show that the borrower's geographical position,work type,monthly income and debt collection are collected.The reason and interest rate of borrowing will affect whether the borrower will default.Especially,the number of borrowers' default accounts for the default.On this basis,through the method of feature selection,it deletes the variable of poor recognition ability,which hasno causal relationship with the breach of contract and is of lower importance.Finally,24 indexes are selected to form a personal credit default evaluation system.The empirical results on the Prosper real data set show that the random forest classifier has the highest recall rate compared with the CART,LDA and LR classifier.It shows that the advantage of the random forest lies in the ability to correctly predict the default user in the sample,and is more suitable for the personal credit default warning problem.In addition,there is a serious problem of data imbalance in the problem of personal credit default.From the data published by Prosper,the default rate of the loan remains at a lower level of about10%.In order to solve this problem,the weighted random forest(WRF)algorithm is used to give a larger weight to the minority groups,and to increase the cost of classification error.The results show that the WRF algorithm improves the recall rate to 62%,and obtains the highest AUC score and the outside fraction.It is proved that the model not only can correctly predict the default users,but also has a high generalization ability.It can help investors to make investment decisions to some extent,protect the fundamental interests of investors,and make the P2 P network loan market.The field is healthy and stable.
Keywords/Search Tags:Data Mining, Random Forest, P2P, Credit Default
PDF Full Text Request
Related items