| This dissertation is about classification methods and class probability prediction. It can be roughly divided into four parts. In the first part, we study two classes of problems where it is known that Boosting will overfit data. The first case occurs when the training data is corrupted by independent label noise, and the second occurs when the regions significantly overlap. We begin by observing that in the proper framework, overlapping regions is a special case of noisy data. We introduce a new ensemble learning strategy, the BB algorithm, based on the careful application of both Bagging and Boosting. We demonstrate experimentally that the performance of this algorithm is superior to Boosting when the training set is noisy, and importantly, nearly identical otherwise. In the second part, we provide empirical evidence for a new explanation for Boosting's still remarkable resistance to overfitting by comparing Boosting with the BB algorithm. The third part of this dissertation is to study the bias and variance decomposition of the classification error rate. Specifically, we demonstrate that Bagging in noisy environment reduces not only the variance but also the bias. Finally, we discuss directly the estimation of conditional class probabilities. We propose a new algorithm, termed LogitTree, which combines a linear logistic regression model with tree-structured methodology. We test LogitTree against 11 other competitors on 7 simulated models. LogitTree is the best algorithm (in terms of prediction accuracy) on the 7 models we tried. |