Font Size: a A A

Application Research Of Hybrid Model Based On Ensemble Learning In Personal Credit Evaluation

Posted on:2018-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:M J LouFull Text:PDF
GTID:2436330515980570Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In recent years,China's total consumer loans have been maintained at more than20% of the high-speed growth,the Internet giants have been aimed at the consumer financial market potential.Accurate and efficient is the core competitiveness of major financial institutions,to maintain this advantage is based on big data as the representative of the new financial technology.The establishment of accurate and efficient personal credit scoring model system improving the risk control ability,is the urgent need of financial institutions.Based on the related research at home and abroad,in this paper we propose a hybrid model based on ensemble learning: selected Elastic net-Logistic regression,random forest and XGboost three models as the base models,through the Super Leaner algorithm to get a hybrid model for personal credit scoring.In the empirical part we use the personal basic information,consumer behavior,bank repayment and other data information of nearly 60,000 loan users from an Internet financial intelligence platform.Based on the model theory and the actual business background,we establish a scientific and effective feature engineering framework,carrying out feature extraction,missing value processing and data standardization,to extract the sample information from data at utmost,and finally get a train set with a sample length of 38917 and a test set with a sample length of 16679.The feature dimension of them are both 343 dimensions.Training on the train set,we get three single models and a hybrid model obtained through Super Learner algorithm.On the test set,we evaluate the performance of the models by the KS value and AUC value,and carry out the comparative study on the base models and the hybrid model.The hybrid model is superior to the base models in the two evaluation indexes.At the end of the paper,we sum up the conclusions from the importance of feature engineering and the superiority of the hybrid model,put forward the follow-up research direction based on analyzing the shortcomings of this paper.
Keywords/Search Tags:credit scoring, model ensemble, Elastic net, Logistic regression, random forest, XGboost, Super Learner
PDF Full Text Request
Related items