Font Size: a A A

Research On P2P Risk Control Model Based On Machine Learning

Posted on:2019-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z C ShanFull Text:PDF
GTID:2416330566477029Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Online peer to peer(P2P)lending not only solves the problem that small and medium-sized enterprises are hard and costly to obtain financing,but also allows a part of investors' micro finance to be reasonably utilized.However,in recent years,with the continuous development of online P2 P lending,a lot of problems have been exposed.The huge risk of online P2 P lending on the one hand comes from its own management,and on the other hand,the risk results from a small amount of borrowers for they have serious credit problems.These borrowers cheat in getting loans intendedly and this lends to a large number of bad or overdue debts.This is a serious threat to the long-term development online P2 P lending.Therefore,how to effectively evaluate the credit risk of loan applicants and accurately predict bad or overdue debts has become an urgent problem to be solved in the entire online P2 P lending industry.Because credit reference system can't effectively evaluate the credit risk of loan applicants,this thesis is based on machine learning and big data and makes full use of the advantages of them.In this thesis,we use statistical learning algorithms to mining the multi-dimensional data of the users of online P2 P lending,and establish a statistical model to predict whether users will overdue after getting loans.The main content of this paper has four parts.Firstly,when there is a class imbalance problem in the dataset,the traditional model evaluation indicators such as accuracy and precision are difficult to effectively evaluate the performance of the model.However,the ROC curve and AUC value can effectively measure the performance of the model even if there is a class imbalance problem in the dataset.Therefore,this thesis introduces the ROC curve and AUC value as evaluation indicators of the model performance,and uses the model prediction results to calculate the AUC value instead of the traditional index such as the precision.Secondly,in the process feature construction,we construct features by analysis of the data set.Then a new method is introduced to construct features.In this new method,we use the training set that including the features that we have already constructed to train a XGBoost classifier which includes 500 decision trees.Every train sample will drop in a leaf node of each decision tree.We construct new features by recording the index of the leaf node of each decision tree.Finally we get 500 new features by the new method.We combine the features constructed by analysis of the data set and the features constructed by the new method as the final result of feature construction.Thirdly,in the process feature selection,we proposed a new algorithm: a recursive feature elimination(RFE)method based on XGBoost classifier.By using this algorithm,a sorted set of all features according to their importance can be obtained,from which a number of the most important features can be selected as a result of feature selection.In the last,as we know,the single model's performance and stability are often not ideal,and they can be improved by ensemble learning.In this thesis,a new ensemble learning scheme is proposed based on the performance and stability of the model.We use Blending as the whole scheme of the model in this paper,and combines the Bagging and Stacking.We use both linear classifiers and nonlinear classifiers to obtain the final model by these methods.
Keywords/Search Tags:Online peer to peer lending, Machine learning, Credit risk, Feature engineering, Model ensemble
PDF Full Text Request
Related items