Font Size: a A A

Empirical Research On The Credit Risk Based On Random Forest Algorithm

Posted on:2019-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:S H ChenFull Text:PDF
GTID:2429330566477576Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the increasing development of science and technology,the 21 st century has become a mobile Internet society centered on digitization,informatization and networking,which led to the emergence of a new pattern of Internet lending—P2P.In recent years,as the rapid development of the China's economics,people's income level has been significantly improved,which causes people tend to use social lending resources to carry out the transfer of personal funds.As a result,P2 P network credit platforms with high-yield and easy-to-borrow have become popular at the moment.At the same time,a variety of individual consumer loans,including revolving loans,home loans,auto loans,loans for studying abroad and loans for business services,have been gradually developed.However,the personal credit risk assessment system still cannot meet the needs of various credit institutions,and investors are faced with serious financial security issues,because of the later start of domestic P2 P,incomplete personal credit data,backward risk analysis technology and still imperfect related laws and regulations.How to establish a good model of online loan risk control is an imminent problem in China's credit industry.In this dissertation,we use Random Forest(RF)algorithm to study the credit risk assessment problems,one of the common combined classifier models in machine learning.The RF algorithm combines the advantages of Bagging and Decision Tree and can enhance the generalization performance of the classification,and has highly stable quality.Compared with the other single classification algorithms,the RF algorithm is not easy to be over-fitting,and can better deal with credit risk assessment problems.Meanwhile,considering the actual situation,we propose the weighted random forest model based on the RF algorithm,which can improve the accuracy of the category with higher misclassification costs,and enhance the practicability of this model.Firstly,the original data is preprocessed,including eliminating outliers,polishing missing values,deleting invalid features,normalizing data and checking the correlation of variables.Then,we use fivefold cross validation and RF algorithm to reselect the features.During the empirical research phase,we establish a credit risk assessment model based on random forest with ppdai open datasets.Next,we compare RF model with other single credit risk assessment models,i.e.,SVM,ANN,KNN,and Logistic regression.By experiments,our conclusion is as follows: the RF model has the better classification effect,indicating that the random forest algorithm is more suitable for establishing P2 P network loan risk assessment model.At last,we use the SMOTE oversampling method to solve the P2 P imbalanced data problem,making the classification result of the model more realistic.
Keywords/Search Tags:P2P, Random Forest, Risk Assessment, SMOTE
PDF Full Text Request
Related items