Font Size: a A A

Feature Screening In Ultrahigh Dimensional Categorical Data Based On The Conditional Information Entropy

Posted on:2018-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:C N SunFull Text:PDF
GTID:2310330518997499Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In recent years, the problems of computational cost, screening efficiency and algorithm stability in ultrahigh dimensional data analysis and treatment are the hot research area people concerned. Feature screening for ultrahigh dimensional data has been widely used in biological image, analysis of high frequency time series, tumor classifications, economic forecasting, and other related big data problems. However,the traditional methods of variable selection in high dimensional data are not suitable for solving the ultrahigh dimensional data problems. Hence, the researchers conducted the screening indexes according to the relativity between covariants and response, proposed the marginal screening method for different models and data types, and the real data analysis all have significant effect.This paper does not follow the existing research idea of feature screening which is based on the relativity between variables, but start from the amount of information,according to the character of the information entropy reflecting the information contained in variables (The basic thought is that if the covariates contain a small amount of information, it shows that it has little contribution to the response variable,which can be considered as an unimportant variable), constructs screening index from the angle of the amount of information to conduct feature screening procedure in ultrahigh dimensional data. For the binary response variables, this paper proposed the conditional information entropy screening method (CIES) to screen important variables, which constructs the screening index according to the difference of information entropy between covariant under different class conditions. When the the response variable is multivariate variable, this paper constructed the screening index combined with the class probability based on the CIES screening method, and proposed the weight conditional information entropy screening method (W-CIES). In theory, the screening methods both have sure independence screening property, and the Monte Carlo simulation also further confirms the property. Because of the screening indexes are consist of information entropy and conditional information entropy, when the information entropy is consist of probability, which simplifies the theoretical proof. And the proposed methods have model free property, which is suit for any models and makes the extensive application.
Keywords/Search Tags:Ultrahigh dimensional data, Feature screening, Categorical data, Information entropy, Conditional information entropy
PDF Full Text Request
Related items