Font Size: a A A

Penalized regression methods and validation, with particular focus on chemometric data

Posted on:2009-01-15Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Kraker, Jessica JoFull Text:PDF
GTID:1440390002991218Subject:Statistics
Abstract/Summary:
Quantitative Structure Activity/Property Relationship (QSAR/QSPR) models are general methods used in the area of chemometrics to predict a biological activity or property (such as toxicity) of a compound based on p chemical descriptors of various types. In the context of chemometrics, we analyze prediction problems which may also call for the concurrent selection of predictors with fitting of the regression model. We present an overview and comparison of current commonly applied methods for such analyses.;Model selection from among several models requires the further assessment of the model utility. A review of possible methods, including cross-validation, for this purpose is presented. Results of cross-validation applied to real and simulated datasets are summarized.;Beginning with the closed-form ridge regression model (with L2-norm loss and penalty) and advancing to more computationally-intensive methods (such as the lasso and elastic net), the possibilities for penalized regression have progressed dramatically in recent years. While the methods required to fit these models appropriately require large amounts of computation time, the improvements in prediction accuracy outweigh this concern. Two new penalized regression models (with L1-norm loss functions) are presented along with algorithms for fitting these models. Programming is implemented in the R environment to obtain and to assess the fitted models.
Keywords/Search Tags:Methods, Models, Penalized regression
Related items