Font Size: a A A

Comparison of the ordinary least squares method and some regularization methods under multicollinearity in linear regressio

Posted on:2017-02-10Degree:M.SType:Thesis
University:Louisiana State University Health Sciences CenterCandidate:Huang, YiFull Text:PDF
GTID:2460390011463100Subject:Biostatistics
Abstract/Summary:
In this thesis, we reviewed some variable selection techniques and related theories in linear regression model. Conventional methodologies such as the Ordinary Least Squares (OLS) method is one of the most commonly used technique in estimating the coefficients of input variables in linear regression. But the OLS estimates performs poorly when the dataset contains dependent variables, also known as multicollinearity problem [Hoerl and Kennard, 1970]. To address this problem, penalized methods like ridge regression [Hoerl and Kennard, 1970], LASSO [Tibshirani, 1996] and elastic net [Zou, 2005] were proposed. This thesis focuses on comparing the performance of the four approaches, the OLS estimates, the ridge regression estimates, the LASSO estimates and the elastic net estimates.;In the first part of this thesis, we gave a brief introduction of the OLS estimates, the ridge regression, the LASSO and the elastic net. In chapter 2, we reviewed the concepts and the corresponding properties. In chapter 3, different dataset was simulated to compare the performance of each method. In chapter 4, we applied these four methods into one real data study.;The conclusions are as follows: first, under the condition of multicollinearity, the linear regression model obtained by the ridge regression, the LASSO and the elastic net may have smaller mean square error compared with the OLS estimates. Second, the ridge regression estimates can only shrink the estimated coefficients but can not do the variable selection. Third, the LASSO estimates and the elastic net estimates can shrink the estimated coefficients and do the variable selection simultaneously. Finally, the linear regression model obtained by the elastic net estimates may have the smaller mean square error than the linear regression model obtained by the LASSO estimates when the data set contains highly correlated variables.
Keywords/Search Tags:Linear, LASSO estimates, Method, Variable selection, Elastic net
Related items