Font Size: a A A

Research On Ensemble Model For Credit Scoring And Its Application

Posted on:2012-02-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H XiangFull Text:PDF
GTID:1220330374491633Subject:Finance
Abstract/Summary:PDF Full Text Request
Personal credit is not only the ethical and moral basis for culture and market, but also a great resource for national economic development. The exploitation of personal credit resources can effectively optimize resource allocation, stimulate consumption and ultimately promote economic development. Some developed countries had established their personal system as early as a hundred years ago, which laid a solid foundation for the market economy. In China, the establishment of personal credit system was started in2000, the researches and applications of personal credit are far behind those of the developed countries. Personal credit scoring is the core of personal credit system. accurate assessment of personal credit situation contributes to the development of consumer credit service and the reduction of consumer credit risk.Based on the analysis of the development history of credit scoring and the application of credit scoring models, this dissertation regards the credit scoring as a complicated evaluation system which includes the imputation of missing value, the detection and disposal of outliers, sample optimization, index system selection, scoring model design and the evaluation and application of the scoring model. For missing data problems, different methods are compared and result shows that deletion method is the best choice when missing data is less than10%,while multiple imputation is the most effective method when20%-40%data is missing. For detection and disposal of outliers, a combined method is proposed and experimental result shows the deletion of outliers contributes the enhancement of the accuracy of credit scoring model. Equal-interval binning, equal-frequency binning and optimal discretization method are employed to discretize the continuous variables, experimental result shows discretization contributes the privacy pretection and enhancement of the accuracy of credit scoring model. This dissertation also discusses the problem of structural imbalance in the credit data from the prospective of data and algorithm. The experimental result shows that oversampling method, SMOTE method, KNN-based SMOTE method and cost-sensitive method are all effective on improving the ability of credit scoring model to recognize bad customers, and KNN-based SMOTE method outperforms others in dealing with unbalanced data. Based on the comparison of different methods of sorting features by importance, this dissertation proposed a hybrid method which can take in account all the sorting information and achieve a more robust sorting result and Several feature selection methods in machine learning are tested to obtain a credit scoring index system.Five most commonly-used credit scoring single models are applied and compared. Result shows that Logistic regression model has a better stability and interpretation, but the classification accuracy is slightly lower than the artificial intelligence model. There is no strict assumptions on the data for C4.5decision tree model, and the results obtained by C4.5decision tree model can be easily understood. The drawback of the model is lack of stability and the dependence on expert knowledge and experience. The advantage of Bayes net model is the high stability and interpretation, but the disadvantage is the low classification accuracy. BP neural network model can deal with all kinds of data type and has a high classification accuracy, but the results obtained by BP neural network are not robust and cannot be interpreted. The performance of support vector machine model is quite similar to BP neural network. All in all, statistical and artificial intelligence models all have their advantages and disadvantages, no model outperforms others both in stability and accuracy.To obtain credit scoring model with both high accuracy and stability, the dissertation proposed a variety of ensemble models. Experimental results show that the combination of Logistic regression model and artificial intelligence model in series helps to enhance the stability and accuracy, while lose the interpretation because of multicollinearity. A combination of seven different credit scoring model also shows good performance in stability and accuracy, but the drawback is the complexity in modeling process. Bagging and boosting ensemble models are worth of spread in credit scoring, because they can be easily established and have satisfactory accuracy and stability. Rsm ensemble model achieves similar results to bagging and boosting ensemble models when it comes to higher dimensional data, while not suitable for credit scoring with lower dimensional data. The experimental results of clustering-based bagging ensemble model show that the clustering method contributes to enhance the accuracy and the diversity of base classifiers when appropriate clustering level is chosen. The experimental results of clustering-based selective ensemble model show that clustering method only works for bagging ensemble model, while has little effect on boosting ensemble model.
Keywords/Search Tags:Personal credit, Credit Scoring, Ensemble model, Personal business
PDF Full Text Request
Related items