Integrate Heterogeneous Classifiers Rare Class

Posted on:2008-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:L N Sun

Full Text:PDF

GTID:2208360215960480

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The problem of classifying rare classes is very important in many practical applications. The scarcity of target class instances makes it difficult to classify them correctly by using many traditional classifiers. As this problem is exceptional and complicated, few special algorithms exist for the rare-class classification.This dissertation does some researches on ensembling different kind of classifiers to classify rare classes. The highly skewed datasets make them very difficult to be classified correctly by using traditional classifiers. This paper proposes a novel integration approach, called EDKC (Ensemble of Different Kind of Classifiers), to classify rare-class. EDKC integrates several different kinds of classifiers into an ensemble classifier, and classifies unknown samples by weighted voting. Our experiments carried on benchmark datasets from the UCI Machine Learning Repository show that EDKC not only has a very high overall accuracy, but also has a reasonably high F-measure value, which achieves a good balance of recall and precision for rare-class.The ensemble learning method comes from machine learning fields, which is one of the most effective learning methods for the last ten years and can improve the predictive accuracy of weak classifiers. Compared with the single classifier, it arouses few overfitting phenomena. This dissertation uses a new ensemble learning method. It is different from bagging and boosting because of bagging and boosting only ensembling the same classifiers, but the new method can ensemble different classifiers. So the ensemble learning method can own the advantage of lots of classifiers' and has better overall accuracy.The base of ensemble learning method is the different classifiers have different wrong samples, but we found some samples in database are not classified by lots of classifiers. Some samples which cannot classify by lots of classifiers have bad effect on setting up classifiers. In this dissertation we name this samples which cannot classify by many classifiers as outlier. We try to delete those outliers, and set up classifier to improving the predictive accuracy of classifiers. In this dissertation, the author has explored some feasible rules through the studies and the practices on the rare-class classification problem. It has not only improved the predictive power of rare-class to some extent, but also got the very high overall accuracy. This has put forward a new view for the study on the rare class problem, and provided plenty of experiment data for future research work.

Keywords/Search Tags:

Classification, Rare Classes, Ensemble Learning, Outliers

PDF Full Text Request

Related items

1	Based Eep Rare Class Classification Problem
2	Ensemble Learning And Ensemble Selection For Rare Class Problem
3	Hierarchical Classification with Rare Categories and Inconsistencie
4	Research On Subspace Ensemble Learning
5	Study On Ensemble One-class Classification And Its Applications
6	Research On Multi-source People Web Pages Classification Based On Ensemble Learning
7	Research On Classification Of Satellite Remote Sensing Image Based On Segmentation And Ensemble Learning
8	Research On Chinese Emotional Classification Based On Ensemble Learning
9	Hybrid Ensemble Learning For Imbalanced Data
10	The Research On Ensemble Incremental Learning Classification Algorithm