Font Size: a A A

Study On Ultrahigh Dimensional Feature Screening

Posted on:2017-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiFull Text:PDF
GTID:2180330485498944Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid advance of modern technology for data collection, researchers are able to collect ultrahigh-dimensional data at relatively low cost in diverse fields of sci-entific research, such as genomics, functional magnetic resonance imaging, tomography and finance, to name but a few. However, many dimensional reduction methods and variable selection methods may not perform well for ultrahigh-dimensional data due to the simultaneous challenges of computational expediency, statistical accuracy, and algorithmic stability. Since sure independence screening (SIS) procedure was proposed for the linear model, statisticians have developed many different screening methods for various statistic models and data. Ultrahigh dimensional discriminant analysis and ul-trahigh dimensional linear model are two common and important problems in statistics and still have a lot of research space.In this paper, we first consider sure independence feature screening for ultra-high dimensional discriminant analysis. We propose a new method named robust rank screening (RRS) based on the difference between conditional expectation and uncon-ditional expectation of the rank of predictor’s samples. We also establish the sure screening property for the proposed procedure under simple assumptions. The new procedure has some additional desirable characters. First, it is robust against heavy-tailed distributions, potential outliers and the sample shortage for some categories. Second, it is model-free without any specification of a regression model and directly applicable to the situation with many categories. Third, it is simple in theoretical derivation due to the boundedness of the resulting statistics. Forth, it is relatively in-expensive in computational cost because of the simple structure of the screening index. Monte Carlo simulations and real data examples are used to demonstrate the finite sample performance.Then, we consider feature screening for linear model with multivariate responses and ultrahigh dimensional covariates. Instead of considering the response individually, our paper is concerned with the linear space spanned by the multivariate responses. Based on the projection theory, we project each covariate on the linear space spanned by the multivariate responses, then propose a new screening procedure called projection screening (PS), and establish the sure screening property under some regularization assumptions. SIS originally pointed out that marginal feature screening for ultrahigh linear model may encounter three typical difficulties:irrelevant variables which are highly correlated with the relevant variables can have a high priority for being selected in marginal correlation screening; a relevant variable can be marginally uncorrelated but jointly correlated with the response; collinearity can exist among the variables. To solve the difficulties mentioned above and enhance the screening performance of the proposed procedure, we further develop an iterative projection screening (IPS) procedure. We assess the finite sample properties of the proposed procedure by Monte Carlo simulation studies and further illustrate the proposed methodology by empirical analysis of a real-life data set.
Keywords/Search Tags:Ultrahigh dimensiond data, Discriminant analysis, Robust rank screening, Multivariate responses, Projection screening
PDF Full Text Request
Related items