Font Size: a A A

Clustering Cbr Based On Imbalanced Data Sets For Business Failure Warning

Posted on:2013-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:J L YuFull Text:PDF
GTID:2249330374493422Subject:Business management
Abstract/Summary:PDF Full Text Request
Case-Based Reasoning (CBR) is one of the main forecasting methods in business prediction. It predicts well and can give explanations for the results. Compared to the number of non-failed enterprises, the number of failed enterprises is less in business failure prediction (BFP). But the loss caused by a failed enterprise is huge. Therefore, it is necessary to find a method (trained on imbalanced data set) which forecasts well for this small proportion of failed enterprises and gives high accuracy. Commonly used methods based on the assumption of balanced data sets do not perform well in predicting minority samples in imbalanced dataset constituted by the minority/failed enterprises and the majority/non-failed ones. This article proposes a method called Clustering CBR (CCBR) that integrates clustering analysis into CBR to solve this problem. In CCBR, several case classes are firstly generated through hierarchical clustering, and class centers are calculated by a defined method. When predicting the label of a target case, its nearest case class is retrieved by ranking similarity between the target case and each case class center. Then, several nearest neighbors of the target case are retrieved from the pre-selected nearest case class. After that, the target case’s label is predicted by a vote manner with labels of these retrieved nearest cases. With four imbalanced data sets, the paper tested the performance of the CCBR and compared it with the Traditional CBR, the support vector machine, the LOGIT method, and the MDA method. Compared with the performance of the other four methods, the results show that CCBR performs significantly better in terms of recall.In real-world prediction based on imbalanced samples, the minority class commonly plays an important role. The capability of identifying minority samples in imbalanced data set directly reflects the performance and value of the proposed methods. Traditional data processing methods which aim to eliminate the imbalance of the data sets may lead to over-fitting training samples; lose of useful information and weak reflection of real data distribution. In contrast, the adjustment of specific algorithms based on imbalanced data sets has no such shortcomings. CCBR is generated by altering traditional CBR using clustering analysis to help retrieve more accurate information for prediction on imbalanced samples in business prediction. With comparative analysis of the results, the performance of CCBR is verified. CBICBR is more capable of identifying minority samples than the four benchmark methods. Meanwhile, it is able to predict business status of a firm one year or two years before by applying the model on the data set t-1whose features are one year before business failure and the data set t-2whose features are two years before business failure. Warnings can be given to its operators and stockholders in advance if there are some failure signs.The paper is organized as follows. First of all, the background and the meaning of the research are pointed out. Then, it gives an overview of prediction based on imbalanced data sets, the performance of CBR, the application of clustering analysis in CBR and the forecasting methods in business failure prediction. Meanwhile, it states the paper’s research aspects related to the above view. Secondly, it explores the feature selection and normalization methods. The general principle of CCBR is introduced, with the explanation of case class generation, the choice of the cluster number, the retriever of cluster and cases, and the warning mechanism. Finally, the initial case base is presented, with feature normalization and performance evaluation demonstrated. After these preparations, the experimental results and analysis of these compared methods are displayed. The practical value of CCBR is preliminarily verified by the application of CCBR to the20target cases.
Keywords/Search Tags:Imbalanced Data Sets, Clustering CBR, Business Failure Warning
PDF Full Text Request
Related items