Font Size: a A A

Research On Multi-class Imbalance Learning

Posted on:2018-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:J J BiFull Text:PDF
GTID:2348330518463634Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Class-imbalance learning is one of the most challenging problems in data mining.In imbalance datasets,the number of majority examples is significantly larger than that of the minority examples.This skewed distribution makes many traditional machine learning algorithms less effective in predicting examples of the minority class,since these traditional algorithms often aim to obtain best overall accuracy,while the misclassified minority examples can be tolerated,in case better overall prediction accuracy can be achieved.Class imbalance problems have attracted increasing attention in recent years,it plays an important role in many practical applications,such as medical diagnosis,credit card fraud detection,computer virus detection.The goal of class-imbalance learning is to provide high prediction accuracy for the minority classes,while without severely decreasing the accuracy of the majority classes.Most of the well-established classification techniques cannot be directly applied to imbalanced data.Most algorithms in this domain focus on two-class imbalance learning,while multi-class imbalance learning problem still needs a lot of research efforts.In this paper,the influenceof feature selection algorithm and data selection algorithm on imbalance learning is discussed,we also review the advances in multi-class imbalance learning in recent years,and design a new multi-class imbalance classification algorithm DECOC.In the large-scale experiment of the influence of feature selection algorithm and data selection algorithm on two-class imbalance learning,the influence of 10 kinds of feature selection algorithm and six kinds of data selection algorithms on three classification algorithms are compared in 42 data sets.The experiment uses two different ways:< feature selection algorithm-data selection algorithm-classification algorithm> and <data selection algorithm-feature selection algorithm-classification algorithm>.For each data,combined with feature selection algorithm,data selection algorithm and classification algorithm is total of 10 × 6 × 3 × 2 = 360 kinds of ways.By summarizing the performance of the classification use five evaluation measures,some conclusions are attained,including but not limited to the best combination ways of feature selection algorithm and data selection algorithm on each classification algorithm.A great many research has focused on the influence of feature selection algorithm on classification or the influence of data selection algorithm on classification.However,the influence of feature selection algorithm and data selection algorithm on classification has not been studied.Therefor,the innovation of this paper is that we study the influence of feature selection algorithm and data selection algorithm on class-imbalance learning first.In multi-class imbalance learning problem,we analyze16 well-established multi-class imbalance classification algorithms in detail.We design experiments to compare their performance,using five evaluation measures: Accuracy,G-mean,F-measure,AUC and running time efficiency.The results of this experiment demonstrating thatthe DOVO(Diversified One-against-One)method is the preferred multi-class imbalance classification algorithm,when running time efficiency is not considered.Considering the running time,imECOC + sparse,imecECOC + dense,AdaBoost.M1 and SAMME are better options.Finally,due to the lack of research on multi-class imbalance learning problem,we design a new multi-class imbalance classification algorithmDECOC.DECOC is a multiple classifier system,which generates the ECOC matrix for sparse schemes.Considering the dichotomy classifiers shouldcontribute differently to the final prediction,DECOC assignsweights to dichotomies and uses weighted distance for decoding,where the optimal weights are obtained by minimizing aweighed loss in favor of the minority classes.We conduct experiment on 42 datasets,the new algorithm outperforms the existing methods on 4 evaluation measures includingACC,G-mean,F-measure and AUC.Overall,this work provides important reference value for future research on imbalance learning,it can also help practitioners pick appropriate algorithms for their specific applications.
Keywords/Search Tags:data mining, imbalance data, two-class, multi-class, decomposition
PDF Full Text Request
Related items