Font Size: a A A

Loan Default Prediction Based On Imbalanced Data Classification

Posted on:2014-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:L F ZhouFull Text:PDF
GTID:2269330425973233Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
How to evaluate the loan default risk and calculate its default probability is the basis and an important process in credit risk management in modern financial institutions. Loan default prediction problem and the related credit scoring problem are also hot topics in econometrics and finance.Most loan default data are unbalanced while previous studies have simply ignored this problem or paid little attention to this problem. Using ideas from the research on imbalanced classification and considering the big data problem, we decide to adopt random forests which are parallelable in nature as the classification method. In this thesis, we propose an improved random forest algorithm(WPBRF, Weighted and Parallelable Balanced Random Forest) which allocates weights to decision trees in the forest during tree aggregation for prediction and their weights are easily calculated based on out-of-bag errors in training. WPBRF also exploits the parallel idea in computing and greatly reduce the training time of single decision tree.Experiments results show that our proposed WPBRF algorithm beats the original random forest and other popular classification algorithms such as SVM, KNN and C4.5in terms of both balanced and overall accuracy metrics. Experiments also show that parallel random forests can greatly improve random forests’efficiency during the learning process.
Keywords/Search Tags:Loan default prediction, Imbalanced data, Random forests, Parallel computing
PDF Full Text Request
Related items