| The problem of classifying rare classes is very important in many practical applications. The scarcity of target class instances makes it difficult to classify them correctly by using many traditional classifiers. As this problem is exceptional and complicated, few special algorithms exist for the rare-class classification.This dissertation does some researches on ensembling different kind of classifiers to classify rare classes. The highly skewed datasets make them very difficult to be classified correctly by using traditional classifiers. This paper proposes a novel integration approach, called EDKC (Ensemble of Different Kind of Classifiers), to classify rare-class. EDKC integrates several different kinds of classifiers into an ensemble classifier, and classifies unknown samples by weighted voting. Our experiments carried on benchmark datasets from the UCI Machine Learning Repository show that EDKC not only has a very high overall accuracy, but also has a reasonably high F-measure value, which achieves a good balance of recall and precision for rare-class.The ensemble learning method comes from machine learning fields, which is one of the most effective learning methods for the last ten years and can improve the predictive accuracy of weak classifiers. Compared with the single classifier, it arouses few overfitting phenomena. This dissertation uses a new ensemble learning method. It is different from bagging and boosting because of bagging and boosting only ensembling the same classifiers, but the new method can ensemble different classifiers. So the ensemble learning method can own the advantage of lots of classifiers' and has better overall accuracy.The base of ensemble learning method is the different classifiers have different wrong samples, but we found some samples in database are not classified by lots of classifiers. Some samples which cannot classify by lots of classifiers have bad effect on setting up classifiers. In this dissertation we name this samples which cannot classify by many classifiers as outlier. We try to delete those outliers, and set up classifier to improving the predictive accuracy of classifiers. In this dissertation, the author has explored some feasible rules through the studies and the practices on the rare-class classification problem. It has not only improved the predictive power of rare-class to some extent, but also got the very high overall accuracy. This has put forward a new view for the study on the rare class problem, and provided plenty of experiment data for future research work. |