Font Size: a A A

Empirical Likelihood Inference For High-Dimensional Data With A Diverging Number Of Parameters

Posted on:2018-05-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L FangFull Text:PDF
GTID:1360330515966162Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the biological information,health studies,financial analysis and so on,complicated data such as high dimensional data,censored data are often encountered.With the increase of dimension,data analysis becomes more and more difficult.On the one hand,the increase of the dimension will result in ”Curse of Dimensionality”;On the other hand,the classical theory statistical inferences for large sample are generally based on the assumptions that the dimension is fixed and and relatively small and the sample size tends to infinite.When the dimension p tends to infinite with the sample size n,particularly in the ”super high dimension”(p > n)situation,the outcomes of the classical statistical theory may be no longer valid.Therefore,how to deal with these complicated data to derive statistical inference has become a hot research issue in statistics.Empirical likelihood method proposed by Owen(1988)is a nonparametric statistical inference method.Compared with the traditional asymptotic normality method,empirical likelihood method has many advantages.For example,constructing confidence regions of parameters by using empirical likelihood method does not care about estimating asymptotic variance of parameters.The shape and orientation of confidence regions constructed by using empirical likelihood method are determined entirely by data,and also these confidence regions are range preserving and transformation respecting.In this thesis,we are mainly interesting in the statistical inference of complicated data,where dimensionality p ? ?,as n ? ?.In addition,variable selection is one of the hot issues of the high dimensional data analysis in statistics.In this thesis,we also study variable selection and parameter estimation of the semiparametric model and the additive hazards model by using penalized empirical likelihood method,where dimensionality p ? ?,as n ? ?.The main contents in this thesis include the following several chapters.The second chapter investigates the question of statistical inference for the high dimensional semiparametric model.Firstly,we construct estimators and confidence regions of the unknown parameters by using empirical likelihood method.With the diverging dimensionality,i.e.,p ? ? as n ? ?,we prove that,under some mild conditions,the asymptotic distribution of the empirical log-likelihood ratio statistics for unknown parameters is an asymptotically normal distribution,and prove that the empirical likelihood estimator has the asymptotic consistent property.Secondly,with the situation of diverging dimensionality,a penalized empirical likelihood method for estimating parameters and variable selection for the semiparametric model is proposed.We prove that,under some regularity conditions,the penalized empirical log-likelihood ratio for the high dimensional sparse semiparametric model has an asymptotically Chi-square distribution,and prove that the penalized empirical likelihood estimator has the Oracle property.The third chapter investigates the question of statistical inference for the high dimensional additive hazards model with censored data.We construct estimators of the unknown parameters,which has the asymptotic consistent property.An empirical log-likelihood ratio statistics for unknown parameters and an empirical log-likelihood ratio statistics for the component of the unknown parameters is proposed.It is proved that,with the situation of diverging dimensionality,the proposed statistics have the asymptotic normal distribution and the asymptotic Chi-square distribution under some mild conditions,respectively,which can be used to construct the confidence regions for unknown parameters or the confidence regions(intervals)for the component of the parameters.In addition,we propose a penalized empirical likelihood method for estimating parameters and variable selection for the high dimensional sparse additive hazards model with censored data.The proposed penalized empirical log-likelihood ratio for unknown parameters has the asymptotic Chi-square distribution under some mild conditions,and the penalized empirical likelihood estimator has the Oracle property.The fourth chapter investigates the question of statistical inference for the high dimensional heteroscedastic partially linear single-index model.An empirical log-likelihood ratio statistics for unknown parameters and an empirical loglikelihood ratio statistics for the component of the unknown parameters is proposed.It is proved that,with the situation of diverging dimensionality,the proposed statistics have the asymptotic normal distribution and the asymptotic Chisquare distribution under some mild conditions,respectively,which can be used to construct the confidence regions for unknown parameters or the confidence regions(intervals)for the component of the parameters.The fifth chapter investigates the question of statistical inference for growing dimensional two sample problems.We construct confidence regions for the difference of the means of two samples and the difference in value between coefficients of two sample linear model by using empirical likelihood method.Under some mild conditions,the proposed empirical log-likelihood ratio statistics has an asymptotically normal distribution,and the empirical likelihood estimator for the difference in value between coefficients of two sample linear model has the asymptotic consistent property.The proposed methods in this thesis are illustrated with simulation studies and real data examples.
Keywords/Search Tags:Empirical likelihood, Penalized Empirical likelihood, High dimensional data, Censored data, Semiparametric models, Additive hazards models, Partially linear single-index models, Variable selection
PDF Full Text Request
Related items