Font Size: a A A

Feature Screening Based On Distance-related Ultra-high-dimensional Complex Survival Data

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LuFull Text:PDF
GTID:2430330605463031Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology(especially computer computing and storage capabilities),ultra-high-dimensional data is increasingly appearing in various fields of scientific research today,such as biomedicine,economics,brain image science and so on.The curses of dimensionality" of ultra-high-dimensional data brings great challenges to its statistical analysis and inference,and the feature screening method is one of the indispens-able statistical methods to overcome these challenges.In recent ten years,the research on feature screening method of ultra-high-dimensional data has attracted the attention of many statisticians,and has made important progress.However,the related research for the fea-ture screening method of ultra-high-dimensional complex data is not very rich,there are still many problems to be solved urgently.Based on distance correlation,the feature screening of ultra-high-dimensional survival data with prior information and ultra-high-dimensional semi-competing risks data are studied in this article.In the second chapter,we study the conditional feature screening for ultra-high dimen-sional survival data based on conditional distance correlation.In the study of practical problems,researchers often know beforehand that one or some covariates among many co-variates are important.Then,these information should be taken into account when con-structing the feature screening method to achieve the goal of improving the screening results.For ultra-high-dimensional survival data with such prior information,the chapter proposes a feature screening method based on conditional distance correlation.In order to adapt the conditional distance correlation to right censored survival data and make the proposed method robust for covariates.First,each covariate and survival event time are transformed by the distribution function of each covariate and the distribution function of survival event time.Second,the conditional distance correlation of the transformed variables is considered,the correlation is used as a measure of the correlation between each covariate and survival response variable.Finally,the correlation is used for feature screening.The correlation between each covariate and survival response variable is skillfully characterized by using conditional distance correlation by transformation method.The sure screening property of proposed method is well demonstrated under rather mild assumptions.Numerical simula-tion shows that the method proposed in the chapter is obviously superior to the existing conditional feature screening methods for ultra-high dimensional survival data in t.he liter?ature.The chapter also illustrates the effectiveness of the proposed method through a real data analysis.In the third chapter,we study the feature screening for ultra-high dimensional semi-competing risks data based on distance correlation.Semi-competing risks data are different from standard survival data.Individuals in semi-competing risks data may experience two related types of events:non-terminal event and terminal event.If an individual experiences a terminal event,it will no longer experience a non-terminal event,i.e.the non-terminal event can be right censored by terminal event;but not vice versa.It is not appropriate to perform feature screening for non-terminal event and terminal event,respectively.Based on distance correlation,the chapter proposes and studies a joint feature screening method for ultra-high-dimensional semi-competing risks data.Similar to the method in the second chapter,we first transform each covariate and two event times by the distribution function of each covariate and the joint distribution function of the non-terminal event and the terminal event.Second,we consider the distance correlation of the variables after transformation,and finally we perform feature screening by using the correlation.The proposed method in this chapter can select the covariates which have important influence on the non-terminal event and the terminal event,respectively,and can also determine the covariates which have important influence on both the non-terminal event and the terminal event.Under rather mild technical assumptions,we demonstrate that the proposed joint feature screening procedure enjoys good theoretical property.An adaptive threshold rule is further proposed to simultaneously identify important covariates and determine number of these covariates.Extensive numerical studies are conducted to examine the finite-sample performance of the joint feature screening method via distance correlation.Lastly,we illustrate our suggested joint feature screening procedure through a real example.A summary of this article and further research are given in the fourth chapter.
Keywords/Search Tags:Distance correlation, Conditional Distance Correlation, Ultra-high dimensionality, Feature screening, Conditional feature screening
PDF Full Text Request
Related items