Font Size: a A A

SCAD Regression Of High Dimensional With Small Sample Data

Posted on:2022-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:2480306572490344Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The rapid growth of data scale and the diversification of data features make the rapid development of data analysis,and also make the objects that data analysis needs to deal with more and more complex.And then more variables are needed to describe these complex objects,which produces high-dimensional data.In high-dimensional problems,the size of observed samples is often much smaller than the characteristic dimension due to financial expenses,ethical and other reasons,and this problem is universal in many fields such as medicine,biogenetics and military science.In this paper,we focus on the problem of variable selection and the ability to predict in high dimensional linear regression models with small samples.The traditional variable selection method SCAD and Lasso are used to deal with the high-dimensional problem with small sample,and combined with Bayesian Bootstrap and Monte Carlo simulation with ran-dom weights,two hybrid integration methods are proposed.The first method is SCAD compression method based on Bayesian bootstrap sampling.The basic idea of this method is to randomly weight the Bootstrap sampling through the Bayesian posteriori to expand the size of observation samples.Then,new samples are pro-cessed through the SCAD method.And the median or average value of the regression co-efficients obtained from multiple sampling is used as the final estimation of the regression coefficient.The second method is SCAD compression method based on Monte Carlo simulation with random weights.The basic idea of this method is to weight the the original observation sample with a random number with mean value of 1 and obstain new samples.Then,new samples are processed through the SCAD method.And the median or average value of the regression coefficients obtained from multiple sampling is used as the final estimation of the regression coefficient.Both simulation analysis and empirical analysis show that,when the variable dimension p is much larger than the size of observed samples n,the two new methods are less affected by random interference,and they not only inherit the ability of variable selection of SCAD,but also perform better in prediction than SCAD and LASSO,especially when the size of observed samples is small.
Keywords/Search Tags:High-dimensional linear regression, Variable selection, Small sample size, Bayesian bootstrap, LASSO, SCAD, Monte Carlo Simulation
PDF Full Text Request
Related items