Font Size: a A A

Study On Default Risk Assessment Methods With Missing Data For Online P2P Lending

Posted on:2020-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:T G XuFull Text:PDF
GTID:2439330578966000Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Compared to traditional loans,the default risk of P2 P loans is more seriously because of its virtuality and information asymmetry.Effective risk management is the basic guarantee for the steady development of P2 P platform.However,the universality and diversity of P2 P data make it miss more seriously.In this paper,how to use incomplete data in P2 P platform to train effective default risk models is a problem to be solved.Incomplete data in P2 P platform can be divided into two categories including attribute value loss and category label loss,which can be used to help improve the performance of default risk assessment models.To solve the problem of missing values,the traditional classification algorithms need to firstly fill in the missing values based on the assumption of missing at random,and then train the model.However,missing data usually contains a mixture of three missing mechanisms,and the single missing mechanism hypothesis will affect the data filling results,thus affecting the effect of the model.In this paper,a tree model with strong robustness for missing data is used for modeling,and LightGBM classification algorithm is proposed to build a default risk assessment model,which not only does not need to pre-fill the missing data,but also has high efficiency.Finally,empirical analysis based on the data of renrendai platform proves that direct LightGBM modeling is better than the traditional filling method.To solve the problem of missing category labels,traditional default risk assessment methods only used labeled samples for modeling.However,the model is applied to the full sample,thus sample selection bias problem is caused and the performance of the model is affected.Therefore,reject inference on unlabeled samples is necessary to help correct sample selection bias.In this paper,semi-supervised approach was used to build default risk model,and a collaborative training model based on sample and feature differences named TRICMV was proposed.The model adopts the model voting mechanism based on "multi-view learning" and the adaptive model iteration mechanism based on "noise learning theory",which can control the noise added into the model.Finally,the results of the empirical analysis verify the validity of TRICMV model.
Keywords/Search Tags:P2P lending, default risk assessment, missing data, reject inference
PDF Full Text Request
Related items