Quantile Feature Screening For Ultra High Dimensional Censored Data

Posted on:2022-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z T Tian

Full Text:PDF

GTID:2480306485984029

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Ultra high dimensional data are widely used in biological information,image processing and economic problems.In this kind of data,the dimensions of covariates are much larger than the sample size,and increase with the increase of the sample size,but the variables that play a role are few,showing the characteristics of sparse.In the process of statistical modeling,if the variables unrelated to the response are selected,it will interfere with the understanding of the relationship between the variables,and it will increase the cost of continuous observation of the variables in the future.Important variables need to be screened out to reduce the dimension of covariates.In the ultra-high dimensional data,the traditional variable selection method has high computational cost,statistical accuracy and algorithm stability are challenged.In order to overcome the problems caused by ultra-high dimension,the feature selection method represented by SIS(sure independent screening)has attracted much attention,and many effective methods have been developed.This paper studies the feature selection problem of ultra-high dimensional censored data.When the response variable is right censored,it is necessary to select the feature variable with strong corre-lation with the response variable through the observation data.Because the response variables are not completely observed,it will be a big deviation to directly use sis and other methods for feature selection.If only the fully observed part of the sample is used,the information contained in the sample is not fully utilized.Although there are many scholars studying feature selection of ultra-high dimensional censored data,they are generally based on the specified model or assumption The censored variables have nothing to do with covariates and are not stable when used.In this paper,we use conditional quantiles to study the feature selection of ultra-high dimen-sional data in the case of non random censoring and random censoring.The conditional quantile can be used to transform the deletion problem into the problem of complete observation data.When the response variable is not randomly right deleted,the conditional quantile of the obser-vation value of the response variable is used to measure the correlation between each feature and the response variable,and then the correlation is used for feature selection.When the response variable is randomly right censored,the censored variable is allowed to be correlated with some active variables(variables related to the response variable).The conditional quantile of the ob-served value of the response variable about the censored variable and the characteristic variable can be used to measure the correlation between each characteristic and the response variable.Under certain assumptions,the feature selection method with non random deletion has deter-ministic selection property and ranking consistency;while the deletion variable is related to some active variables,the feature selection method with random deletion also has deterministic selec-tion property and ranking consistency.Simulation results show that,compared with the existing methods,in the case of non random deletion,the method based on monotone invariance is not only simpler in computational form,but also has the same filtering ability as other methods based on conditional quantiles;in the case of random deletion,if the covariates are related to the deleted variables,the proposed method has a comparative advantage.

Keywords/Search Tags:

ultra-high dimensional data, censoring data, feature screening, conditional quantile

PDF Full Text Request

Related items

1	Adaptive Variable Screening For Ultra-High Dimensional Heterogeneous Data
2	Some Studies On Feature Screening Of Ultra-high-dimensional Longitudinal Data And Group Structured Data
3	Feature Screening Of Ultra-high Dimensional Classification Data With Exposure Variables
4	Feature Screening Based On Distance-related Ultra-high-dimensional Complex Survival Data
5	Research On Feature Selection Of Ultra-high-dimensional Competitive Risk Data Based On Correlation Rank
6	Grouped Feature Screening For Ultra-high Dimensional Data
7	Research On Feature Selection Methods Of Two Types Of Ultra-high Dimension Right Censored Data
8	Research On Feature Selection Method Without Model Constraints Under Ultra High Dimensional Data
9	Gini-Index Based Feature Screening For Ultrahigh Dimensional Catagorical Data
10	Ultra-high Dimensional Missing Data Analysis Based On Model Averaging