Font Size: a A A

Research On Default Prediction Of P2P Lending Based On Unbalanced Data Set

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LuFull Text:PDF
GTID:2439330620463392Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
P2P online lending is a person-to-person online lending method.Compared with the traditional lending model,P2 P online lending is characterized by no mortgage,online application and rapid lending.After the establishment of China's first P2 P platform in2007,it experienced a period of prosperity and growth,and quickly became one of the important channels for personal financing.However,there were many problems behind the "prosperity and development",which made the P2 P industry frequent thunderstorms since 2016.The first reason is that there is no sound regulatory system and legal system leading to a large number of non-compliance platforms;Second,there is no perfect credit system,information asymmetry is serious;Third,many platforms do not have appropriate risk control system,resulting in high default rate of users,causing great losses to investors.Therefore,it is of great significance for the healthy development of P2 P industry to improve the credit system and improve the platform's ability to identify users in default.Generally speaking,the number of defaulted users is much smaller than the number of normal users,so the data set of online loan is unbalanced.If the traditional algorithm is used to conduct model training directly,many problems will arise.Therefore,this paper improves the traditional algorithm from different perspectives.Firstly,the logistic regression algorithm,which is widely used in the field of credit score,the BP neural network with strong autonomous learning ability,and the Light GBM algorithm,which stands out in various competitions,are selected from many learning algorithms to solve the class imbalance problem from different perspectives.From the perspective of cost sensitive learning,the cost-LR algorithm and cost-BP algorithm are proposed.From the perspective of data,E-Light GBM algorithm is proposed: the idea of Easy Ensemble algorithm is used to form multiple balanced data subsets by multiple undersampling of most classes and the combination of a few classes,and Light GBM algorithm is used as the basic algorithm to train multiple models.Finally,the voting method is used to obtain the final prediction result.A case study was conducted on the historical transaction data sets of Lending Club and paipaidai,a representative P2 P platform in China and the United States.Feature selection was carried out by combining filter with the importance of random forest features,and F2 score,Gmean and Auc were used to evaluate the performance of the model.The results show that the three improved algorithms have solved the class unbalance problem to some extent,among which the E-Light GBM algorithm has the highest performance.Through the study of the discovery of the platform of the data set,LC platform in the data set contains more perfect credit history data,and the third party to provide credit index,and pat loans mainly is the user's basic information,social network,all kinds of authentication information,such as the experimental results show that three kinds of algorithms in Lending Club on the performance of the data set was higher than on credit,so in order to promote the healthy development of the P2 P industry,our country should as soon as possible to P2 P network platform access to bank credit loan system,perfect the credit system and circulation of data sharing and to provide a more powerful safeguard for the development of P2 P industry.
Keywords/Search Tags:P2P lending, Default prediction, Class imbalance, cost-LR, cost-BP, E-Light GBM
PDF Full Text Request
Related items