Study On Classification Algorithms For Bank’s Bankruptcy With Imbalanced Data

Posted on:2013-09-22

Degree:Master

Type:Thesis

Country:China

Candidate:W S Zhong

Full Text:PDF

GTID:2249330374975444

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Bank’s bankruptcy, especially large bank’s bankruptcy, may lead to crisis of the wholebank industry. The bank system’s crisis may cause huge loss to the economy of the wholecountry, or even lead to financial crisis, such as the bankruptcy of Lehman Brothers in2008caused a series of chain reactions, and finally the Financial Tsunami stroke the word; so, toenhance the monitoring to the commercial banks, especially for those banks facing bankruptcy,give them early warning, or even close the banks which are on the verge of bankruptcy, willbe good to the whole financial environment.Under normal economy environment, most of the banks are in healthy status, just a littlepart of them are facing bankruptcy, in other words, it is a two class classification problem, i.e.classify the banks into healthy and unhealthy banks. A bank facing bankruptcy will bereflected in its financial data, so the research on bank bankruptcy is focuses on the financialdata. On the other hand, number of the healthy banks is much larger than those banks facingbankruptcy; so, it is an unbalanced classification problem. And this article will focus on theimbalance data classification algorithm research based on the banking financial data.The data this article referring to is from Federal Reserve Bank of Chicago’s website.Anyone can download them from this site. The dataset includes so many banks’ financial data,and there are thousands of properties under each bank. Use those data directly for machinelearning is not efficient, our research will use the financial ratio instead of those raw data. Forbetter forecasting those banks’ bankruptcy, data mining technique is applied to this study.After standardize the financial data to the standard machine learning format, we can use it formachine learning and prediction, and analyze the performances of those algorithms.Due to the nature of the imbalance data, the academic community nowadays is focusingon the following two areas when studying the imbalanced data classification: data processingand algorithm improvement. Data processing is an approach to make the imbalance data to bebalance; algorithm improvement will assign different weights when it is mis-classified foreach class. In order to improve the poor performance of the minority class, this article applythe synthesized method on data processing like random over-sampling, random undersampling, SMOTE over sampling, together with SVM to compose the classification.machine.on the other hand, cost-sensitive will be introduced to SVM, and will revise the SVMalgorithm to find the lowest cost classification algorithm. Data experimental results show thaton the banking data, the random over sampling algorithm is the best one in the criteria ofdegree of accuracy, if using cost to measure the output, then the cost sensitive is the best one.

Keywords/Search Tags:

Imbalanced data, Bank Bankruptcy, Support Vector Machine, SMOTE, Cost-Sensitive

PDF Full Text Request

Related items

1	Modeling And Application Of Support Vector Machine Based On Grey Incidence Analysis And Improved SMOTE
2	Research And Application Of Support Vector Machine On Imbalanced Data Classification
3	Imbalance-Oriented Study On Enterprise’s Financial Distress Prediction
4	A Credit Card Customer Segmentation Model Based On Improved Support Vector Machine
5	Support Vector Machine (SVM) Based On Feature Engineering Application For Non-life Insurance Bankruptcy Prediction
6	Research On Evaluation Of Credit For Farmers Based On Support Vector Machines
7	Research On Enterprise Bankruptcy Prediction Based On Data Mining Technology
8	Research On P2P Credit Evaluation Method Based On Machine Learning
9	Demonstration Study Of Customer Churn Prediction Based On Data Mining
10	Bank Loan Classification Preliminary Study Based On The Support Vector Machines