Font Size: a A A

Testing For Both Expectation Dependence And Heteroscedasticity And Inference For Non-sparse High-dimensional Models

Posted on:2016-07-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ZhuFull Text:PDF
GTID:1109330461984438Subject:Financial mathematics and financial engineering
Abstract/Summary:PDF Full Text Request
This dissertation consists of two main topics:testing for expectation depen-dence and heteroseedasticity and estimation in transformation models. The first topic of testing including two kinds of testing:testing for positive expectation dependence and heteroscedasticity check in single index model, will be studied. We also consider the estimation for high-dimensional non-sparse linear transfor-mation models in the second topic. As two leitmotifs of statistics, testing and high dimension data are becoming progressively more prominent, both in theory and in practice.1. Testing for Expectation Dependence Wright (1987) [88] first pro-posed the notion of first-degree expectation dependence and Li (2011) [54] ex-tended the concept to higher-degree expectation dependence. The concept of expectation dependence has widely used to study economic issues and financial problems, such as portfolio problem and asset allocation, the demand for a risky asset, the portfolio diversification, the optimal investment and so on.It is well acknowledged that in the context of dependence, first-degree expec-tation dependence is a stronger dependence than the correlation between random variables. The notion has received more attention in recent years. But the prob-lem whether positive or negative expectation dependence holds or not is not all so clear-cut in practice. Directly assuming this type of dependence, without sta-tistical evidence, can lead to devastating effects, resulting in adverse performance in equity premium and asset allocation. To the best of our knowledge, testing expectation dependence has not yet received much attention.In first part of this dissertation, we use the equivalent forms about first-degree postive expectation dependence to rewrite the null and alternative hypotheses. We propose a test of Kolmogorov-Smirnov type for first-degree positive expecta- tion dependence. The related asymptotic properties are studied, which show that the proposed test can control the type I error well and is consistent against global alternative hypotheses. Further, the test can detect local alternative hypotheses distinct from the null hypothesis at a rate as close to as possible. This rate is the fastest possible rate in hypothesis testing. We also extend it to the higher degree cases and obtain the parallel results as these of first-degree case. To implement the proposed tests, a nonparametric Monte Carlo test procedure is suggested to simulate p-values because of the intractability of the sampling and limiting null distributions.2. Heteroscedasticity Checks for Single Index Models The single in-dex model (SIM) with the scalar outcome variable Y and p-dimensional covariato X is formulated as Y= g(XTβ)+ε, E(ε|X)= 0, (0.0.5) where g(·) is an unknown smooth function, β is a p-dimensional unknown param-eter vector, and ε is the error term whose conditional expectation given X is zero. Here the. notation XT in (0.0.5) denotes transposition of X. For identifiability consideration, we assume that the parameter vector β satisfies ‖β‖= 1 and the first component of β is positive, where ‖·‖ stands for the Euclidean norm. If the link function g(·) is given in advance, the SIM reduces to a generalized lin-ear regression model. Thus, the SIM is comparably flexible in model structure. Further, compared with fully nonparametric regression models, the SIM captures the information of Y through one-dimensional variable XTβ. This feature makes the SIM retain a better interpretability and avoid the curse of dimensionality commonly occurred in nonparametric regression models. Therefore, as a com-promise between fully parametric and fully nonparametric regression models, the SIM has drawn much attention due to its wide use in several research fields such as economics and statistics, see Powell et. al. (1989) [69] and Ichimura (1993) [49].Estimating the mean function in the SIM has been extensively discussed in the literature. For instance, Ichimura (1993) [49] proposed a semi-parametric least squares estimator for general structure models; Hardle and Stoker (1989) [44] developed an average derivative estimation that can result in an estimator con-verging to the true value of the index parameter at the rate of n-1/2. We call such kind of estimator consistent estimator. Xia et al. (2002) [85] proposed an adaptive1 approach, called minimum average variance estimation (MAVE) which can be used to the SIM with weaker conditions; Cui et al. (2011) [17] introduced a method of estimating functions to study the SIM; Sheng and Yin (2013) [72] further proposed a new estimation method based on distance covariance. How-ever, these estimators have adverse consequences for the efficiency and can be even inconsistent in the presence of heteroscedasticity. Thus, heteroscedasticity testing is an important issue for the SIM.In the second part of this dissertation, we develop two test statistics accord-ing to the different model structures. The first one is a kernel-smoothing type nonparametric test that is against fully nonparametric heteroscedasticity. In fact, this test statistic can be used to check whether there is heteroscedasticity or not without assuming any specific formula of the variance function under the alter-native; hypothesis. However, when the dimension of covariates is lar,ge, it may suffer from the curse of dimensionality in nonparametric estimation. When both mean and variance functions have the common dimension reduction structure with the same index, a test incorporated with the dimension reduction structure is suggested such that the test can largely avoid the curse of dimensionality. The interesting features are as follows. When the dimension reduction structure holds, the second test has faster convergence rate and can detect the local alternative hypotheses distinct from the null hypothesis at a faster rate than the one that the first test can achieve. However, the second test could also perform badly when the dimension reduction structure does not hold. That is, it is not robust against nonparametric model structure.3. Non-sparse High-dimensional Transformation Models Consider the following linear transformation model: H(Y)= XTβ+ε, (0.0.6) where Y is an observable scalar random variable, H(·) denotes the true, known or unknown, monotone transformation function, X= (X1,X2,...,Xp)T is a p- dimensional predictor vector, β= (β1,β2,…βp)T is the regression parameter vector of interest, and the error term e. which is independent of X, has a continu-ous distribution. This model, which circumvents the so-called curse of dimension-ality, is widely used in many areas of modern science such as genetic; microarrays, medical imaging, text, recognition, finance and chemometrics.When H(-) is a known function, the model (0.0.6) includes the well-known proportional hazards model and proportional odds model, both of which have been extensively studied. On the other hand, when H(·) is unknown, it be-comes a classical semi-parametric model, which has also been investigated in the literature. Broadly speaking, it is a distinguished feature of high-dimensional regressions that the dimension p is high, but the sample size n is relatively small. Furthermore, it is common that in high-dimensions many of the predictors are insignificant to the response, so variable selection becomes very necessary. As a dinnension-reduction tool, the transformation model appears naturally in high-dimensional environments, and many variable selection methods via shrinkage or penalization have been proposed. Those methods include but are not limited to the penalized partial likelihood method (Tibshirani,1997 [77]), the penalized marginal likelihood method (Lu and Zhang,2007 [61]), the martingale estimat-ing equation based method (Zhang et al.2010 [93]), and the penalized smoothed rank correlation method (Lin and Peng 2012 [57]). Another group of important approaches are based on the sure screening idea of Fan and Lv (2008) [33], see, e.g., Zhu ct al. (2011) [98] and Li et al. (2012) [55].It is noted that, under some technical conditions, the estimator for sparse linear models converges at, the ideal risks of rate and respectively for the LASSO (Tibshirani,1996 [76]) and the Dantzig selector (Can-des and Tao,2007 [7]), where; s denotes the number of nonzero coefficients and k is related to a restricted eigenvalue assumption, see Zhang and Huang (2008) [91] and Peter et al. (2009) [66] for details. For a general nonlinear model van de Geer (2008) [81] established a similar result for/rpenalized estimators, while in the context of generalized linear models Fan and Lv (2011) [34] proposed a class of penalized likelihood approaches that achieve the rate However, when p and s are large, these risks become very large and even unacceptable for practical applications. In order to reduce the risk and to obtain a strictly faster rate of convergence, Belloni and Chornozhukov (2013) [2] recently proposed a least squares estimator after model selection for linear models.Even more noteworthy, in practice the sparsity condition may not be rea-sonable since it can not be detected or verified in real data analysis. To see the effect of this condition on estimation, we assume for simplicity that the first q predictors, represented by L={1,2,...,q}, are relevantly significant, and the last p - q predictors are insignificant but not necessarily irrelevant. We write X and β as X=(ZT, UT)T and β=(vT,vT)T, where Z= (X1,..,Xq)T and U= (Xq+1,..., Xp)T. Correspondingly, we consider the following working model: H(Y)= ZTv+ε, (0.0.7) where e= UTv+ε denotes the error term. A natural way of obtaining a working model of this kind is through variable selection such as the LASSO. However, there may be some significant variables which are omitted from the true model after variable selection, especially when model (0.0.6) is not sparse. In this case, if Z is correlated with some components of U, then in general the following formula holds: The above inequality implies that the working model (0.0.7) is biased. Conse-quently, it becomes a challenging task to obtain a consistent estimator of v using traditional estimation methods. Borrowing the terminology from the econometric literature, predictors in model (0.0.7) are called endogenous (Fan and Liao,2014, [32]), which often correspond to the relative significant ones in the full model (0.0.6). Therefore, it is not reasonable to expect an unbiased working model that has an identical form to its full model.In the third part in this dissertation, firstly, we define the relatively signif-icant predictors in a non-sparse model after using the LASSO and identify the post-LASSO selection working model. Then, by using a quasi-instrumental vari-able, we reconstruct a biased-corrected post-LASSO selection model to be an unbiased partially linear model, which is not a traditional partially linear model. We also obtain a root-n consistency of the estimator in the context of non-sparse high-dimensional transformation models. Further, we provide the asymptotical-ly normal properties for the coefficients associated with the relatively significant predictors. Lastly, the estimator of this newr model involved the nonparametric estimation, which may result in invalid estimation, we will advise a criteria to reduce the dimension of the quasi-instrumental variable to a lower value with-out loss of much information. Our limited numerical experience shows that this simple criterion works pretty well.Lastly, we summary the main results of this dissertation and propose the further work for deep study.
Keywords/Search Tags:Expectation dependence, Nonparametric Monte Carlo test, Test of Kolmogorov-Smirnov type, Heteroseedasticity check, Single index model, Non- parametric estimation, Dimension reduction, Transformation models, Instrumental variable, Variable selection
PDF Full Text Request
Related items