Font Size: a A A

Statistical Analysis On Several Models Of Missing Data And High-dimensional Data

Posted on:2013-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L P ZhuFull Text:PDF
GTID:1260330425962084Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
A large amount of missing data is often collected in the field of social research, biomedical, economic management for various reasons. At the same time, with the development of technology, more and more high-dimensional data can be obtained in the field of genetic life sciences, financial mathematics. The traditional statistical methods are no longer suitable for the missing or high-dimensional data, so how to inference validly in the case of missing or high-dimensional data has attracted the attention of many scholars. A lot of research on missing data are made in the past80years and a series of effective methods have been proposed (see Rubin and Little (2002). Cao (2009)). For high-dimensional data, due to the sparse nature, the variable selection is one of the core of the high-dimensional data, especially in recent years, is also one of the hot issue in the statistical community. Many effective variable selection methods have been proposed (sec Fan and Lv (2010). Candes and Tao (2007), etc.). However, it is not sufficient to study the statistical inference methods for missing data, as well as the high-dimensional data. In this paper, the statistical inference of the linear functionals, the estimated equations and the redundancy of estimated equations under missing data and the variable selection problem under high-dimensional data are been study further.The statistical inference of the linear functionals such as the mean value, higher moments and mixed higher moments of the variable is one of the important issues in the statistics. The linear functional under missing data is discussed in chapter2. Usu-ally missing data are supplement with the estimator of conditions expectation. Being assumption of the parameters or non-parametric structure of conditional expectation. the risk of model mistaken or the non-parametric high-dimensional problem may be produced. The repairable condition of the mean function is proposed by Hu, Follmann and Qin (2010) both to avoid effectively high-dimensional problem and to ensure the unbiased of estimated function. So, the repairable condition of linear functional is also proposed to inference the Linear functional under missing data and the asymptotic results are obtained. Simulations show the advantages of our methods.Many traditional statistical inference methods such as the method of least squares, maximum likelihood method, etc. can be summed up as a statistical inference based on estimating equations, so statistical inference of estimating equations has a certain uni-versality and has been more attention in the past20years. But literature on estimating equations under missing data is not much. Wang and Chen (2009) and Zhou, Wan and Wang (2008) interpolate the missing estimated function with the non-parametric esti-mator of conditional mean of estimating function by different method separately. Both estimators of the interpolation function are unbiased, but the corresponding empirical likelihood ratios do not converge to the standard chi-square. It may cause some difficult to construct confidence interval. The main reason is using partial imputation method. In chapter3, the inverse probability weighting and the generalized inverse probability weighting method are applied to the statistical inference of estimated equations under missing data. The asymptotic properties of consistency and asymptotic normality, the likelihood ratio are obtained. The results show that the asymptotic results of the two methods are similar. The empirical likelihood ratios arc asymptotic chi-square to avoid adjusted empirical likelihood. The simulations also further illustrate the advantages of our method.A class of singular phenomenon appears in empirical likelihood parameter esti-mator of estimating equations under the missing data, that is, it is better to use the estimator of weighted selection probability function than a real one, even a known one (Qin,Zhang and Leung (2009)). At the same time, in practical problems, because many of estimating equations can be constructed, it become a problem whether increasing es-timating equations can enhance the efficiency of the interest parameter. So, In chapter4, the concept of redundancy and partial redundancy of estimated equations, as well as redundancy of the parameters are presented, and the redundancy equivalent condi-tions are given, further, the strange phenomenon is explained. Simulations illustrate the redundant phenomenon of estimated equations. Today, the variable selection of high-dimensional data is a popular one in statis-tics. The methods of variable selection can be concluded two types. One is the pun-ishment class based on the linear model such as lasso (Fan and Lv (2010)), the other is linear programming based on norm minimization problem by statistical correlation constraints, such as Dantzig. Since the algorithm’s advantage of the latter, the Dantzig received much attention. But unless the Irrepresentable conditions are satisfied, model selection of Dantzig is inconsistent, on the other hand, when there is a strong correla-tion between important and unimportant variables, Dantzig is generally not good. To solve the two problems, in Chapter5, two new methods are proposed, Ridge Dantzig combination Dantzig with Ridge and bodantzig by the bootstrap method, that the basic idea is the important variable will be always selected in high probability. The simulations and empirical results show the advantages of the above two methods.
Keywords/Search Tags:missing data, high-dimensional data, linear functionals, estimatingequation, redundancy, Dantzig
PDF Full Text Request
Related items