Font Size: a A A

Quantile Feature Screening For Ultra High Dimensional Censored Data

Posted on:2022-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z T TianFull Text:PDF
GTID:2480306485984029Subject:Statistics
Abstract/Summary:PDF Full Text Request
Ultra high dimensional data are widely used in biological information,image processing and economic problems.In this kind of data,the dimensions of covariates are much larger than the sample size,and increase with the increase of the sample size,but the variables that play a role are few,showing the characteristics of sparse.In the process of statistical modeling,if the variables unrelated to the response are selected,it will interfere with the understanding of the relationship between the variables,and it will increase the cost of continuous observation of the variables in the future.Important variables need to be screened out to reduce the dimension of covariates.In the ultra-high dimensional data,the traditional variable selection method has high computational cost,statistical accuracy and algorithm stability are challenged.In order to overcome the problems caused by ultra-high dimension,the feature selection method represented by SIS(sure independent screening)has attracted much attention,and many effective methods have been developed.This paper studies the feature selection problem of ultra-high dimensional censored data.When the response variable is right censored,it is necessary to select the feature variable with strong corre-lation with the response variable through the observation data.Because the response variables are not completely observed,it will be a big deviation to directly use sis and other methods for feature selection.If only the fully observed part of the sample is used,the information contained in the sample is not fully utilized.Although there are many scholars studying feature selection of ultra-high dimensional censored data,they are generally based on the specified model or assumption The censored variables have nothing to do with covariates and are not stable when used.In this paper,we use conditional quantiles to study the feature selection of ultra-high dimen-sional data in the case of non random censoring and random censoring.The conditional quantile can be used to transform the deletion problem into the problem of complete observation data.When the response variable is not randomly right deleted,the conditional quantile of the obser-vation value of the response variable is used to measure the correlation between each feature and the response variable,and then the correlation is used for feature selection.When the response variable is randomly right censored,the censored variable is allowed to be correlated with some active variables(variables related to the response variable).The conditional quantile of the ob-served value of the response variable about the censored variable and the characteristic variable can be used to measure the correlation between each characteristic and the response variable.Under certain assumptions,the feature selection method with non random deletion has deter-ministic selection property and ranking consistency;while the deletion variable is related to some active variables,the feature selection method with random deletion also has deterministic selec-tion property and ranking consistency.Simulation results show that,compared with the existing methods,in the case of non random deletion,the method based on monotone invariance is not only simpler in computational form,but also has the same filtering ability as other methods based on conditional quantiles;in the case of random deletion,if the covariates are related to the deleted variables,the proposed method has a comparative advantage.
Keywords/Search Tags:ultra-high dimensional data, censoring data, feature screening, conditional quantile
PDF Full Text Request
Related items