Font Size: a A A

Statistical Inference For High-dimensional Data

Posted on:2012-11-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B XiangFull Text:PDF
GTID:1110330338966308Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
high dimensional data occurs in diverse fields in practice, such as computational biology, health studies, financial analysis, and risk management. In this paper, we will focus on high dimensional data analysis in statistics. Particularly, We mainly consider the following two issues:high dimensional test and high dimensional vari-able selection.First of all, we give a brief introduction of the high dimensional data analysis in statistics. Secondly, we consider the hypothesis testing problem for independence of sets of variates in high dimensions. A test statistic is proposed and its asymptotic null distribution is also given, as both the sample size and the number of variables go to infinity. Consequently, this test can be used when the number of variables is not small relative to the sample size, in particular, even when the number of variables exceeds the sample size. Thirdly, we study the high dimensional adaptive Lasso when the errors of linear regression model have the finite 2kth moment for an integer k> 0. We prove that, without the assumption of Gaussian tail errors, the adaptive Lasso still have the Oracle property. Moreover, we suggest a two-step approach to deal with the ultra-high dimensional data. Fourthly, we consider the adaptive group Lasso in the situation where the number of factors diverges with the sample size. Similarly as the adaptive lasso, we establish the Oracle property of adaptive group Lasso in high dimensional settings. Finally, we consider the problem of model selection for infinite variance autoregressive models. In particular, we use two penalty methods to simultaneously select variables and estimate coefficients. First we use self-weighted least absolute deviation (SLAD) as the loss function, and show that the penalized SLAD identifies the true model consistently and the resulting estimator is asymptotically normal. Second, we show that the results of model selection by penalized SLAD can be improved if we choose the ordinary LAD as the loss function, although the limiting distribution of the resulting estimator does not have a closed form. Thus, for model selection purpose only, the latter performs better, while for further statistical inference, the former is a better choice.
Keywords/Search Tags:Independence of sets of variates, High dimensional data, Adaptive lasso, Oracle property, Adaptive group lasso, Infinite variance autoregressive model, Least absolute deviation
PDF Full Text Request
Related items