| In the fields of biomedicine,economics,sociology,and so on,people often encounterdata with characteristics of heterogeneity,ultra-high dimensions,and qualitative categorical variables.The statistical modeling of such data has been a hot topic in modern statistical research,which is extremely challenging.Xie et al.(2020)[1]studied the heterogeneous information of explanatory variables and the relevant statistics of qualitative categorical variables in the given classification,which were under the circumstance of ultra-high dimensional data.In addition,they proposed the method of category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data.This method is only suitable for datasets of categorical responses,but not for the variable screening of datasets of continuous responses with heterogeneous structures.Based on the study of Xie et al.(2020),this paper focuses on the adaptive variable screening method for ultra-high-dimensional heterogeneous data.Meanwhile,it makes some innovation about the application scope and framework of this method.The following are the main work.1.For the continuous response variable data with heterogeneous structure,this paper studies the variable screening method in the assumption that the data are with the characteristics of heterogeneity and sparsity.Continuous responses are divided into the ordinal categorical variables in order to screen the relevant explanatory variables under the heterogeneous structure.At the same time,the estimation of the threshold division is also given.This method proves to have sure screening property and ranking consistency property.2.For the data with characteristics of heterogeneity and ultra-high dimensions,the explanatory variables’properties of empirical quantile are studied.This paper builds a dummy variable to calculate the difference between the marginal quantile and the conditional quantile of the explanatory variables,so that the vital explanatory variables,which are relative to the responses under a specific classification,could be screened.Based on such a framework,this paper proposes the category-adaptive quantile-based variables screening method for ultra-high dimensional heterogeneous categorical data.This method proves to have sure screening property and ranking consistency property.The footstone of the two methods are similar:the difference between the marginal distribution and the conditional distribution of explanatory variables is considered as a marginal effect to screen related variables.What’s more,under ordinary conditions,the method based on quantiles proves to be determinant in screening and consistent in the ranking.Finally,through simulation and real data research,it shows that the method has good performance in various environments,as well,is able to effectively extract heterogeneous information in classification and find real models with complex data. |