Font Size: a A A

Study On Classification Algorithms For Bank’s Bankruptcy With Imbalanced Data

Posted on:2013-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:W S ZhongFull Text:PDF
GTID:2249330374975444Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Bank’s bankruptcy, especially large bank’s bankruptcy, may lead to crisis of the wholebank industry. The bank system’s crisis may cause huge loss to the economy of the wholecountry, or even lead to financial crisis, such as the bankruptcy of Lehman Brothers in2008caused a series of chain reactions, and finally the Financial Tsunami stroke the word; so, toenhance the monitoring to the commercial banks, especially for those banks facing bankruptcy,give them early warning, or even close the banks which are on the verge of bankruptcy, willbe good to the whole financial environment.Under normal economy environment, most of the banks are in healthy status, just a littlepart of them are facing bankruptcy, in other words, it is a two class classification problem, i.e.classify the banks into healthy and unhealthy banks. A bank facing bankruptcy will bereflected in its financial data, so the research on bank bankruptcy is focuses on the financialdata. On the other hand, number of the healthy banks is much larger than those banks facingbankruptcy; so, it is an unbalanced classification problem. And this article will focus on theimbalance data classification algorithm research based on the banking financial data.The data this article referring to is from Federal Reserve Bank of Chicago’s website.Anyone can download them from this site. The dataset includes so many banks’ financial data,and there are thousands of properties under each bank. Use those data directly for machinelearning is not efficient, our research will use the financial ratio instead of those raw data. Forbetter forecasting those banks’ bankruptcy, data mining technique is applied to this study.After standardize the financial data to the standard machine learning format, we can use it formachine learning and prediction, and analyze the performances of those algorithms.Due to the nature of the imbalance data, the academic community nowadays is focusingon the following two areas when studying the imbalanced data classification: data processingand algorithm improvement. Data processing is an approach to make the imbalance data to bebalance; algorithm improvement will assign different weights when it is mis-classified foreach class. In order to improve the poor performance of the minority class, this article applythe synthesized method on data processing like random over-sampling, random undersampling, SMOTE over sampling, together with SVM to compose the classification.machine.on the other hand, cost-sensitive will be introduced to SVM, and will revise the SVMalgorithm to find the lowest cost classification algorithm. Data experimental results show thaton the banking data, the random over sampling algorithm is the best one in the criteria ofdegree of accuracy, if using cost to measure the output, then the cost sensitive is the best one.
Keywords/Search Tags:Imbalanced data, Bank Bankruptcy, Support Vector Machine, SMOTE, Cost-Sensitive
PDF Full Text Request
Related items