Font Size: a A A

Research On The Influencing Factors And Lending Decision Model Of P2P Online Loan Lending Results Based On Data Mining

Posted on:2018-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:G J WeiFull Text:PDF
GTID:2359330515480568Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
P2P lending refers to the unsecured loans between lender and borrower through the network lending platform rather than the financial institutions.Since 2015,the development of P2 P lending in China is rapid.According to the annual report on China's P2 P lending industry in 2015,the number of P2 P lending platform has increased from 2918 to 5121.The report shows that the cumulative annual volume in2014 was 252.8 billion yuan while it has reached 982.04 billion yuan in 2015.However,by the end of February 2017,5882 P2 P network lending platform has been set up in total in China,but there are 3547 of them has been suspended or regard as problem platforms.It follows that control the risk of P2 P network lending platform allows of no delay.This article is based on the real loan data of "HaoDai P2 P Lending Platform network",recognize the significant factors that influence the loan result from a series of characteristic variables from the applicants and establishes an effective credit scoring model to determine the applicant's loan result.In view of this,the details of this article is shown as follow:In the part of data preprocessing,I combine application variables in apply table and applicants' characteristic variables in applicant information table into individual analysis table by SQL.First,delete invalid data through logical processing.Second,use KNN interpolation method to interpolate missing value and deal with outliers by WOE sub-box Method.Finally,3003 valid data and 20 applicant characteristic variables were got.In the part of significant factor recognition,I select 14 variables that were significant for the loan result by using Information Value.Then,Random Forest Model is adopted in calculating the average decrease of the Gini value of each significant variable and gain the ranking of impact of each variables on loan result.The greater the impact the larger the average decrease in Gini value.The results showed as follows: the biggest influence factor of loan result is previous credit record of applicant,next is occupation and assets status of the applicant,and followed by the loan amount and loan period while the personal characteristics such as gender and marriage status are the small influences.Lastly,a ratio of success and failure is used to explore the variables deeper,the result suggests that if the applicant has credit card,his success rate is 20 times higher than the person who without the credit card.What'smore,single card maximum amount,time to open account,salary,years of working and education level are all directly proportional to the success rate of loan.In the part of modeling,Logistic regression,SVM,neural network,AdaBoost,GDBT and XGBoost were applied in this article.Firstly,classify the applicants by K-means clustering,and summarize each type of the characteristics of applicant,then combine the prediction result of each model in different type of applicants.It turns out that the combining result after clustering has enhanced obviously.Specificly,the accuracy,sensitivity and specificity of the model has increase by 3.31%,17.39% a nd 11.05% respectively after cluster,which means K-means clustering can increased the by 17.39% and decrease the risk to misjudge the default applicant as realiable applicant by 11.05%,which means K-means clustering can increased the by 17.39%and decrease the risk to misjudge the default applicant as realiable applicant by11.05%.Hence the conclusions are as follows: there is a big difference between different applicants,Building the model from all type of applicant as a whole will ignore these difference between each type of applicant and therefore leads to decrease in accuracy,.By using K-means clustering to classify applicants first,it enhances the ability of models to capture the different characteristics of different applicants so that the accuracy of the model is increased.
Keywords/Search Tags:P2P lending, Random Forest model, K-means clustering
PDF Full Text Request
Related items