Font Size: a A A

Feature Screening Of Ultra-high Dimensional Classification Data With Exposure Variables

Posted on:2020-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2370330602458654Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of information acquisition and transmission technology,the data collected by us is high-dimensional and contains a large amount of irrelevant redundant information,which promotes the research of feature screening methods.With the advancement of data mining technology,more and more implicit variables are exposed,and the effects of explanatory variables on the response variables sometimes rely on certain exposure variables,such as time or environment factors.Therefore,it is very important to screen out the covariates associated with the response variable conditions in exposure variable.This paper proposes a conditional feature screening method for ultrahigh dimensional classification data based on the characteristics of variable information entropy and the difference of information quantity,when the response variable is two categorical variable and multi-categorical variable.A conditional information entropy feature screening method(C-CIES)is proposed based on exposure variables.when the response variable is two categories of classification data.In this paper,the certain exposure variable is introduced based on the CIES method,and a new screening index is constructed according to the difference of the response variable in different categories and the edge condition information entropy of the covariate.Under the condition of no model assumption,the screening property of the screening index is theoretically proved,and the consistency property of the ranking is performed.Monte Carlo simulations were performed in the case of marginal independence but joint correlation and marginal and associations are also relevant,and compared with PC-SIS,IG-SIS,and CIES methods.The simulation results show that C-CIES can better screen out the real variables in both cases,and only the C-CIES method can filter out the covariates related to the response variable conditions.In this paper,using the idea of W-CIES method,under the condition of given exposure variables,a new feature screening index is constructed by the difference between the edge condition information entropy of the covariate and the edge unconditional information entropy and the weight which is the joint category probability of the exposure variable and the response variable.Under the condition of no model assumption,the screening property of the screening index is theoretically proved,and the consistency property of the ranking is performed.Monte Carlo simulations were performed in the case of marginal independence but joint correlation and marginal and associations are also relevant,and compared with PC-SIS,IG-SIS,and CIES methods.The simulation results show that CW-CIES can better screen out the real variables in both cases,and only the CW-CIES method can filter out the covariates related to the response variable conditions.The screening method of this paper relies on information entropy.The information entropy is composed of probability.Therefore,the method of this paper is applicable to any model and has the characteristics of free model.It can be seen from the simulation results that it is more suitable for the case of covariate and response variable marginal independence but joint correlation.
Keywords/Search Tags:exposure variable, feature screening, conditional information entropy, ultrahigh dimensional classification data, conditional correlation
PDF Full Text Request
Related items