Font Size: a A A

Research On Feature Screening Method For Ultrahigh Dimensional Discriminant Analysis Data

Posted on:2020-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:B H ShenFull Text:PDF
GTID:2370330623457311Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the in-depth advancement of the contemporary scientific research and the technical development,ultrahigh dimensional data has already penetrated into all fields of modern society.This situation is both opportunity and challenge for statisticians.On the one hand,it can obtain massive data at low cost.On the other hand,traditional statistical analysis methods are no longer applicable due to high computational cost and low efficiency.Considering that only a few covariates are related to the response(sparse assumption)under ultrahigh dimensional data,statisticians have begun to pay attention to and investigate feature screening methods.They try to reduce the dimensions of the data to the general high-dimensional,and then use traditional methods for research and analysis.As an important branch of ultrahigh dimensional research,ultrahigh dimensional discriminant analysis data are involved in bioinformatics,proteomics,face recognition,brain images,machine learning,social network analysis and other fields.Therefore,it is particularly important to study the feature screening methods for ultrahigh dimensional discriminant analysis data.This paper presents feature screening methods for ultrahigh dimensional discriminant analysis data from three different perspectives.Above of all,this paper proposed the feature screening index(MS)based on conditional distribution,which can be directly applied to the multi-categories situation.Compared to existing screening methods,the MS screening method has the following advantages.Firstly,it is model-free which does not require specific model assumptions.Secondly,the screening method has good robustness when the covariates obey the heavy-tailed distributions.Thirdly,the sure screening property and ranking consistency property can be proved under some mild conditions.At the same time,numerical simulation and real-data analyses further verify the effectiveness of the method.Next,noting that if the local expectation of the predictor is significant different,the predictor may contribute to discrimination.Based on this discovery,the paper uses the ratioof the conditional variance to the unconditional variance named variance ratio to measure the contribution of the predictors to the classification,and proposes the variance ratio sure independent screening(VR-SIS)procedure for ultrahigh dimensional discriminant analysis.The method can effectively screen main effects and interaction effects simultaneously when the response is allowed to have a diverging number of categories.In addition,it is relatively inexpensive in computational cost because of the simple structure,which can be widely applied in practice.This paper illustrates the finite sample properties of the method through Monte Carlo simulation studies and two real-data analyses.Finally,in order to study the marginal relationship between covariate and the response,this paper considers the difference in the values of each covariate under different categories,and detect important covariates associated with the response.If the difference is greater,the greater the impact of the covariate on the classification.A noval model-free ultrahigh dimensional feature screening named Mann-Whitney screening is proposed when the response is binary(MWS).Further,this paper constructs the feature screening indicator in the case of multi-classification.It is not difficult to find that the proposed screening method is model-free.Owing to invariance to the monotone increasing transformation of covariates,the method can discover any nonlinear relationship between the response and covariates.It should not be overlooked that it is robust to the covariates with heavy-tailed distributions.In addition,this paper establishes the sure screening property,ranking consistency property and controlling false discoveries without imposing subexponential tail probability conditions.Simulation studies and a real data example are conducted to evaluate the finite sample performance of screening procedure.
Keywords/Search Tags:Ultrahigh dimensional discriminant analysis data, conditional distribution function, variance ratio, Mann-Whitney screening, sure screening property
PDF Full Text Request
Related items