Font Size: a A A

Research And Application Of Personal Credit Risk Assessment Based On Internet Data

Posted on:2018-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q XiaoFull Text:PDF
GTID:2359330512484739Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of consumer finance,people experience the convenience of credit for life.However,in the form of the Internet economy,the drawbacks of the credit model based on traditional data have gradually revealed.Traditional data has the following limitations: the authenticity of the data can't be verified;the data is not dynamically changed and the coverage of data is not complete.While Internet data is the record of the user's real behavior on the Internet,and it is dynamic.So Internet data can make up for the limitation of traditional data.Therefore,it is of great value and practical significance to study the personal credit risk assessment based on Internet data.And the core problems of using the Internet data for personal credit risk assessment are two points.First of all,how to establish an effective index system of personal credit risk assessment based on Internet data? The traditional data is from the user's application,so the index system of the traditional is simple and fixed.However,the Internet data covers a wide range,so it is necessary to carry out a large amount of analysis and data mining to construct the valuable index from the massive Internet data.Secondly,how to build an effective model of personal credit risk assessment for Internet data? At present,most studies are based on the model of traditional data.While the Internet data has the characteristics of big noise,high dimension and sparse data,it is difficult to get a good prediction effect if only based on the traditional algorithm.Therefore,this thesis is mainly based on the above two problems.The main work and innovation are as follows:1)An index system of personal credit risk assessment based on Internet data has been proposed and established.Firstly,by analyzing the disadvantages of the traditional index system and the characteristics of Internet data,this thesis proposes an index system based on the data of Internet.Then on the basis of the actual Internet data,we construct an index system which contains tens of thousands of features.Finally,the index system is optimized by the vacancy rate and the IV statistics principle.2)A model of personal credit risk assessment based on Internet data has been built and achieved good effect.Through the analysis of advantages and disadvantages of Logistic regression,decision tree,random forest and other methods of personal credit risk assessment,this thesis chooses the traditional statistical method of Logistic regression and nonparametric model of GBDT respectively to establish the model of personal credit risk assessment based on Internet data.And the models have achieved good effect.The results of the Logistic regression model in the test set are as follows: the index of AUC is 0.71,the index of KS is 0.35.The results of the GBDT model in the test set are as follows: the index of AUC is 0.73,the index of KS is 0.37.3)A fusion model of personal credit risk assessment based on GBDT and Logistic regression has been proposed.After comparing GBDT with Logistic regression in the aspects of advantages,disadvantages,complementaries and the classified predicted performance on the textual data,this thesis establishes a fusion model of personal credit risk assessment based on GBDT and Logistic regression by using the principle of GBDT discrete feature.In addition,the empirical study shows that in the aspect of the accuracy on classified prediction and generalization ability,the fusion model are improved significantly comparing with the single model.
Keywords/Search Tags:Internet credit, GBDT, Logistic regression, feature discretization, feature combination
PDF Full Text Request
Related items