Font Size: a A A

Some Statistical Inference And Application Of High Dimensional Regression Models

Posted on:2018-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhaoFull Text:PDF
GTID:2310330563452382Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the development of modern technology,there exist many massive complex data in many fields.High dimensional data is the typical representative of this phenomenon.For example,in biomedical studies of DNA microarrays,DNA microarrays typically have thousands of genes.But the cost of experiments is relatively high,it often can only get a small sample size.So the number of genes is much greater than the sample size of data.It is becoming more and more important to establish a high-dimensional linear model for microarray data and to carry out relevant statistical inference.Variance estimation is a fundamental problem in statistical inference.It is closely related to the interval estimation and hypothesis testing of the model.In the classical linear model,the least squares estimation(OLS)is the most common method for the estimation of variance.By choosing the appropriate coefficients,the sum of squares of residuals is minimized,and the estimation of variance is obtained.However,it is very challenging to obtain reliable estimates in high-dimensional data,especially for p>n.If the variables are selected or reduced in high-dimensional data,based on the new model,the variance estimation method is used to estimate the variance.It will also be very poor because of the loss of important variables or the choice of too many independent variables.Moreover,the asymptotic distribution of the variance estimator is completely dependent on the variables selected in the first stage obtained by the traditional variable selection method.The variance of the variance estimator also increases as the data dimension increases.So it is particularly important to estimate variance in high-dimensional regression.In this paper,we will select the microarray data of rat eye tissue.The microarray data were used to study other genes which associated with the pathogenic gene TRIM32 that led to Bardet-Biedl's syndrome.A high-dimensional linear model is established for these genetic data and the corresponding statistical inference is made.The estimation of variance is an indispensable part of interval estimation and significance test.In this paper,the refitted cross-validation(RCV)and the method-of-moments(MM)are used to estimate the variance,and compare with the traditional two stage estimation method.It can be seen that the above two methods weaken the deviation caused by the loss of important variables or the choice of toomany independent variables,which can effectively improve the accuracy of variance estimation.Moreover,the variance estimates obtained by the two methods are consistent and asymptotically normal.At the same time,the new variance estimator is used for interval estimation and hypothesis test,and a significant linear regression model is obtained.
Keywords/Search Tags:high-dimensional linear regression, variance estimation, significance test, refitted cross-validation, method-of-moments
PDF Full Text Request
Related items