Font Size: a A A

Feature Screening Ultrahigh Dimensional With Surrogate Data

Posted on:2019-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2370330545970148Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the development of big data,the problem of analyzing data has become widely available in a large variety of scientific fields,such as biomedical imaging,gene expression and proteomics studies,tumor classification and so on.The dimension of the covariates is allowed to increase at an exponential rate of the number of sample size,but only a small number of predictors contribute to the response,the sparsity is satisfied.It makes the traditional data analysis method inaccurate.The results may be biased and inefficient.In order to conduct more accurate analysis of ultrahigh dimensional data and obtain more effective information from the data set,we need to reduce the dimension of ultrahigh dimensional data.Since the ultrahigh dimension reduction can effectively solve this problem and its application prospects are very broad.So many statisticians have done a lot of research works for ultrahigh dimensional feature screening.Generally theses methods are divided into two steps.Firstly,reducing the dimension of ultrahigh dimensional data to the size of the sample,feature screening will retain all important variables.Secondly,we conduct variable selection on the basis of dimension reduction.In studying the relationship between a response variable and a set of explanatory variables,missing data often occurs in covariates due to the difficulty of obtaining the variables or the high cost involved.In general,simply discards all observations with incomplete data,can result in biased and inefficient estimating results.Therefore,it is particularly important to explore ways to solve missing data problems.At present,many statisticians continue to explore in depth on this issue and the results of theoretical research are increasingly rich.This paper is concerned with feature screening for the ultrahigh dimensional data with surrogate data when covariates are missing at random.First of all,starting from the simplest linear model,nonparametric imputation is used to construct the connection between the precise observation data and the corresponding surrogate data.We show that the proposed nonparametric imputation feature screening procedure for ultrahigh dimensional surrogate data enjoys the sure screening property in the sense of Fan and Lv(2008).Then,we propose a robust feature screening procedure built upon the weighted nonparametric imputation technique without any parametric model assumptions.When the dimension of surrogate variable is not high,both the inverse probability weight function and the augmented conditional expectation function can be estimated by means of nonparametric fitting,which ensures the consistency of the screening index.When the dimension of surrogate variable is high,the inverse probability weighted function and the conditional expectation function can be given some parametric model assumptions.One of the two parametric model assumptions is wrong,the double robust property still guarantees some well consistent estimate of the screening index.In addition to the proof of the theoretical property,Monte Carlosimulation studies are conducted to examine the performance of the proposed procedure and a real data application is also conducted to evaluate and illustrate the proposed methods.
Keywords/Search Tags:ultrahigh dimensional, surrogate data, feature screening, nonparametric imputation
PDF Full Text Request
Related items