Font Size: a A A

Coefficient Test And Its Application Of High-dimensional Regression Model

Posted on:2021-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y TanFull Text:PDF
GTID:1360330632953403Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology,various fields present a variety of complex data,such as missing data,censoring data,truncating data,high-dimensional data.The emergence of high-dimensional data not only brings a lot of effective information,but also brings new opportunities for the development of statistics.When the dimension of covariate p is fixed and n is large,the traditional statistical method performs very well.However,under the high dimension settings,the classical statistical inference theory may fail.Therefore,how to make statistical inference under the high dimension data has always been the focus of scholars.This paper mainly studies the coefficient test under the high dimension regression model.Graphs or networks are common ways of depicting information.In biology,in particular,many different biological processes are represented by graphs,such as metabolic pathways.For combining the biological networks or graphical information,the first Chapter studies the statistical inference of the single regression coefficient of the high-dimensional linear model.First,the unbiased estimation of the single regression coefficient is constructed based on the L1 penalty and the Laplace matrix,and its asymptotic distribution is derived.Extensive numerical examples are pre-sented to demonstrate the advantages of the proposed test statistic for finite sample size.Finally,the proposed method is applied to the human liver cohort dataset Compared with other methods,the method proposed in this chapter can effectively identify effective genes.Although the linear model is simple,there are often nonlinear relationships be-tween data in practice.In order to characterize the linear and nonlinear relationships at the same time,Scholars proposed the partially linear models.In Chapter 3,we study the global hypothesis test for regression coefficients of partially linear models when the number of covariates in the linear part diverges.First,under the null hy-pothesis,we use polynomial spline to estimate the unknown function.Then,based on the expectation of Score function,We construct U-type test statistics and prove the asymptotic distribution under the null hypothesis and local alternative hypothe-sis.Simulation results indicate that the proposed test statistic work well under many scenarios.Also,the proposed test can distinguish the null hypothesis and alternative hypothesis even under misspecified models.Finally,the method proposed in chapter is applied to the breast cancer data,and the results show that the proposed test statistics can effectively identify useful variables.In the first two chapters,it is assumed that the errors are independent and i-dentically distributed.However,in the real data,especially in the financial data,heteroscedasticity often exists.Thus,In Chapter 4,for the high dimensional expec-tile linear regression models,we study the global test and local test of regression coefficients.Similar to the third Chapter,we construct U-type test statistics.Theo-retically,based on the martingale's central limit theorem and some mild conditions,we obtain the limit distribution of the proposed test statistics under the null hypoth-esis and local alternative hypothesis.Numerical results shows that the proposed test statistics can effectively distinguish the null hypothesis and alternative hypothesis,especially in non-sparse settings.Finally,the proposed test statistics are applied to the stock return data.The results show that the high-dimensional linear model is difficult to describe the relationship between covariates and response variables.
Keywords/Search Tags:Big data, B-spline, Debiased estimator, Expectile linear models, Factor model, Graph-constrained estimation, High-dimensional inference, High-dimensional test, Linear models, Partially linear models, Quantile, U statistics
PDF Full Text Request
Related items