Font Size: a A A

Research On Credit Scoring Model Based On Logistic Regression And Support Vector Machine

Posted on:2022-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:L T WangFull Text:PDF
GTID:2480306509989249Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The development of the mobile Internet and the emergence of Internet finance companies have promoted the rapid growth of Internet finance.At the same time,the rapid development of Internet finance has also brought some problems.The blind expansion of enterprises and the ignoring of risks have caused a large number of companies to thunder and go bankrupt,and many investors lost money heavily.For Internet finance companies,it is extremely important to enhance risk control capabilities and improve credit management.The credit scoring model is a tool that uses mathematical models to predict the possibility of default.It is widely used in the traditional financial industry and has also begun to expand into the Internet finance field in recent years.This paper takes the loan data of a foreign Internet company as the research object,and conducts the research on the credit risk scoring model.First,carry out weight of evidence conversion and exploratory analysis on the data,and use IV(Information Value)for preliminary screening of variables.Then,the Logistic Regression and Logistic-Lasso models were established and their performances were compared,and the abnormal parameters among them were analyzed.Based on the selection of variables using the Logistic-Lasso model,a support vector machine model was established.Aiming at the large amount of data in this paper and the low solution speed of the support vector machine,a two-stage method was proposed using the idea of sampling,which can effectively eliminate redundancy.Thus,we can speed up the solution under the premise of ensuring the classification performance.Finally,the four models are comprehensively compared.In the empirical analysis of this article,it is found that the Logistic Regression after using the Lasso model to filter variables is much less than the general Logistic Regression model,but the classification performance does not show significant differences in indicators such as AUC and KS statistics,which proves the Lasso model Filter the validity of the variable.In the analysis of the abnormal parameters,it is found that the balancing of the data in the classification problem will change the original distribution of the data,which may cause problems such as information loss or abnormal parameter estimation.The classification performance of support vector machines is generally better than logistic regression,but the solution speed is slow,and the method of removing redundant points can speed up the solution.In the empirical study of this article,the calculation time is reduced by about 60%,and the classification performance has also been improved.It is proved that for the support vector machine model,if an effective way can be taken to eliminate redundant points,the solution can be speeded up while ensuring the classification performance.
Keywords/Search Tags:Credit Scoring, Logistic Regression, Lasso, Support Vector Machine
PDF Full Text Request
Related items