Ensemble methods for classification and prediction in noisy environments

Posted on:2003-10-11

Degree:Ph.D

Type:Dissertation

University:University of Pennsylvania

Candidate:Long, Chuan

Full Text:PDF

GTID:1468390011482566

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

This dissertation is about classification methods and class probability prediction. It can be roughly divided into four parts. In the first part, we study two classes of problems where it is known that Boosting will overfit data. The first case occurs when the training data is corrupted by independent label noise, and the second occurs when the regions significantly overlap. We begin by observing that in the proper framework, overlapping regions is a special case of noisy data. We introduce a new ensemble learning strategy, the BB algorithm, based on the careful application of both Bagging and Boosting. We demonstrate experimentally that the performance of this algorithm is superior to Boosting when the training set is noisy, and importantly, nearly identical otherwise. In the second part, we provide empirical evidence for a new explanation for Boosting's still remarkable resistance to overfitting by comparing Boosting with the BB algorithm. The third part of this dissertation is to study the bias and variance decomposition of the classification error rate. Specifically, we demonstrate that Bagging in noisy environment reduces not only the variance but also the bias. Finally, we discuss directly the estimation of conditional class probabilities. We propose a new algorithm, termed LogitTree, which combines a linear logistic regression model with tree-structured methodology. We test LogitTree against 11 other competitors on 7 simulated models. LogitTree is the best algorithm (in terms of prediction accuracy) on the 7 models we tried.

Keywords/Search Tags:

Prediction, Classification, Noisy, Algorithm

PDF Full Text Request

Related items

1	Research On Noisy Label Image Classification Algorithm Based On Group-teaching And Visual Semantic Confidence-Guided Mixup
2	Study Of Image Classification With Noisy Label
3	Research On Text Classification With Noisy Labels
4	Research On Machine Learning Classification Algorithm Based On Conformal Prediction
5	Research On Noisy Semi-Supervised Text Classification Method Based On BERT
6	Improvement Of Decision Tree Algorithm Based On Hadoop And Research On Classification And Prediction Of Forestry Data
7	Mining noisy data: A prediction quality perspective
8	Support Vector Machine Algorithm With Differential Privacy Protection
9	Research On Parallelization Of Classification Algorithm Based On Spark Platform In Telecom Customer Churn Prediction System
10	Research On Sparse Signal Reconstrcution Algorithms With Noisy Measurement