Font Size: a A A

Contributions to regression and classification tree method

Posted on:2006-01-18Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Song, QinghuaFull Text:PDF
GTID:1458390008476819Subject:Statistics
Abstract/Summary:PDF Full Text Request
The dissertation consists of two parts: the first part is related to regression problems and the second part concerns classification problems.;In Part I, we present some new methods for regression problems. The methods integrate GUIDE (Generalized, Unbiased, Interaction Detection and Estimation) trees with a spline-based method called MARS (Multivariate Adaptive Regression Splines), or with a sum of smooth functions called GAM (Generalized Additive Models), by adding MARS or GAM method to the tree leaves. The goal of our approach is to decrease the prediction error and increase the interpretability by taking advantage of different regression methodologies. Experimental comparisons on real data sets reveal the superiority of MARS regression trees when compared to the two individual approaches. The results from our experiments suggest that adding GAM to linear regression trees does not provide substantial gain in accuracy over the individual approaches. We also show that ensembles of GUIDE trees are superior to a new and popular tool called "random forest".;In Part II, we are interested in learning curve analysis of logistic regression and tree-structured algorithms. Results from some previous comparison of linear logistic regression with C4.5 show that the classification accuracy of linear logistic regression is better for small to moderate-sized data sets but that it does not outperform C4.5 when the sample size is large. This is due to the number of variables in logistic linear regression being held fixed whereas the number of nodes in a C4.5 model can grow with the sample size. In our investigation, we use learning curves to study the effectiveness of tree-structured models (C4.5, LOTUS and QUEST) and logistic regression methods (linear logistic regression, quadratic logistic regression and logistic regression with interaction terms) for classification and probability rankings. By using trees as analysis tools, we find some patterns indicating that certain logistic regression methods can be extremely competitive, even for large data sets.
Keywords/Search Tags:Regression, Classification, Data sets, Part, Methods
PDF Full Text Request
Related items