Imbalanced Structure Processing And Feature Selection Of Personal Credit Data

Posted on:2019-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:R P Shi

Full Text:PDF

GTID:2429330566493779

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

The data source of this paper is the historical business data of lending institutions provided in topic 2 of the �Dongzheng Future Cup�in China Undergraduate Statistical Contest in Modeling.Firstly,in connection with the issue of missing data,based on the missing rate of data,they are successively processed by the deletion method and the multiple-filling method,and,the nominal variable is handled by the special category method.Among them,the combined sampling combine SMOTE oversampling method with undersampling of K-Means that has the best prediction effect in dealing with data imbalance.Secondly,in the variable system index selection,the lasso estimation of logistic regression is improved.Four kinds of credit scoring models were used to compare the variable selection methods.According to the characteristics of different models,this variable selection method has different degrees of improvement in the prediction results.Finally,after experimental comparison,the random forest classification accuracy is high;the overall classification accuracy of logistic regression is slightly lower than other models,but the identification of the few samples in the data is higher than the general models;the classification accuracy of the decision tree model is slightly lower than that of the random forest,also the recognition rate of the few samples in the data is low.

Keywords/Search Tags:

Credit score, Variable selection, Imbalance data, Random forest

PDF Full Text Request

Related items

1	Analysis Of The Promotion Strategy For Pet Company
2	Research On Credit Score Model For Credit Card Customer
3	Credit Risk Assessment Method Based On Random Forest In P2P Lending
4	Research On Internet Financial Fraud Recognition Based On Large Data
5	Research On Credit Evaluation Of Small And Micro Enterprises
6	An Empirical Study On Enterprise Credit Risk Assessment Of A Commercial Bank Based On Random Forest
7	Application Of Random Forest In Credit Risk Evaluation Of P2P Lending
8	Comparative Analysis Of Personal Credit Evaluation Based On Random Forest And Back Propagation Neural Network
9	Research On Credit Card Delinquent Behavior Based On Random Forest Classification
10	The Personal Credit Risk Assessment Of P2P Platform Is Based On Random Forest Model