Font Size: a A A

Based On The Combined Model, Whether The User Applies The Second Loan Forecasting Application Research

Posted on:2019-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:2359330548458259Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
P2P micro-lending is a business model that brings a amount of money together to meet someone's financial needs.Generally,it requires the helps of both lenders and borrowers to establish the lending relationship and to complete the relevant transaction procedures through the e-commerce professional network platform[1].With the development of Internet finance companies,borrower's credit informations are also getting better and better.In the past years,researchers did enough work and predictive study.Now,personal loan forecasting is no longer about the aspect of judging whether the user is a good customer or a bad customer,and whether the user will defaults this loan.With the increasing demand of load scale,increasing the stickiness of the borrowing customers and enhancing the experience of customers about product can improve the value of borrower's life cycle and substantially increase the company's revenue.In order to bring more value,it is an important research question whether customers will to borrow secondary loans.It is also the focus of this paper.This project will be based on public real loan data by the data analysis of competition of Rong 360?including 26001 records and 432 variables?.Firstly,by using the Data mining technology and a set of operation processes to discovery the pattern of customer's behavior,and futher to find the main aspects of user's habits and preferences.Next,we estabished fusion model using the latest machine learning algorithms which based on decision tree model to identify the loan users accurately who want to make secondary loans.Finally,in order to make the perfect decision for policymaker,we have drawn the main reasons for affecting personal secondary loans through the estabished models.In the part of data preparation,according to the combination of working experience and multiple experiments,we discussed the preprocessing of data,the overall distribution of data,exploratory analysis and the construction of feature engineering,such as including the optimal binning of continuous variables,the filling of missing values,the detection and treatment of outliers and so on,all of those are better methods to ensure the greatest possible reduce variable information loss.Based on the understanding of the business background,we estabished reasonable feature engineering framework and wediscussed variables selection,introducing common method of maximum information coefficient?MIC?,method of regularization?L1,L2?,these are used to decreases the correlation among the variables in the process of variable selection.Mastery and understanding of the relationship between the dependent variables and the respective variables,will provide some good guidance for desicion makers.Modeling part.Based on the understanding of the business background and the understanding of the model algorithm.Firstly,we estabished three models of the integrated learning framework based on the decision tree,the AUC value of random forest would reach 0.7765,the AUC value of LightGBM would reach 0.765 and the AUC value of gcForest would reach 0.775.Next,In order to improve the accuracy of the model,and comparing the AUC value as the evaluation criterion,this paper constructs a combination model and auc fusion model based on three models.Finally,the effect of test dataset's auc is increased to 0.78925552,which is a very good result.Compared with the previous single classification model,the accuracy has been significantly improved.Finally,according to the importance of the characteristics of the random forest,we draw some conclusion,such as the credit,creditlmtamt,maxtmencode,friendscount,currentbillbal,expectquota,missingcount,maxmonthrepaymissi ng,these eight variables are important variables on the target variables.Also we will futher study from three main aspects:First,user's network shopping behaviors;Second,users'individual information status;Last,users'social network behaviors.The exploratory analysis of these variables is carried out to further excavate.
Keywords/Search Tags:P2P, Feature engineering, Random forest, LightGBM, gcForest, Model integration
PDF Full Text Request
Related items