Font Size: a A A

Imbalanced Structure Processing And Feature Selection Of Personal Credit Data

Posted on:2019-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:R P ShiFull Text:PDF
GTID:2429330566493779Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The data source of this paper is the historical business data of lending institutions provided in topic 2 of the “Dongzheng Future Cup”in China Undergraduate Statistical Contest in Modeling.Firstly,in connection with the issue of missing data,based on the missing rate of data,they are successively processed by the deletion method and the multiple-filling method,and,the nominal variable is handled by the special category method.Among them,the combined sampling combine SMOTE oversampling method with undersampling of K-Means that has the best prediction effect in dealing with data imbalance.Secondly,in the variable system index selection,the lasso estimation of logistic regression is improved.Four kinds of credit scoring models were used to compare the variable selection methods.According to the characteristics of different models,this variable selection method has different degrees of improvement in the prediction results.Finally,after experimental comparison,the random forest classification accuracy is high;the overall classification accuracy of logistic regression is slightly lower than other models,but the identification of the few samples in the data is higher than the general models;the classification accuracy of the decision tree model is slightly lower than that of the random forest,also the recognition rate of the few samples in the data is low.
Keywords/Search Tags:Credit score, Variable selection, Imbalance data, Random forest
PDF Full Text Request
Related items