Font Size: a A A

Interaction Screening Of Ultra-high Dimensional Data Based On Distance Correlation

Posted on:2021-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:M YanFull Text:PDF
GTID:2517306113453474Subject:Statistics
Abstract/Summary:PDF Full Text Request
Ultra-high dimension is the important feature of data collection in contemporary scientific research.In the interaction research of ultra-high dimensional data,the existing methods are based on the pre-assumed specific model for screening.The actual application effect depends on the similarity between the real model and the hypothetical model.When the real model deviates from the hypothetical model,it may lead to wrong selection results.This paper extends the model-free method in the main effect screening to the interaction model,and proposes a new model-free interaction screening method.The main contents and conclusions of this paper are as follows:(1)We propose two new model-free interaction screening methods based on distance correlation,called ISDC-T and ISDC-B,which do not require hierarchical model assumptions or model specification and are therefore robust for complex data where model information is missing.ISDC-T is a two-stage method,which can rapidly reduce the interaction candidate set and improve the screening efficiency by screening the main effect and interaction effect respectively.ISDC-B is based on beam search for interaction screening,compared with forward selection,beam search increases the search space and can effectively improve the effect of interaction screening.Theoretically,we prove that the ISDCT possess ensures screening property for interaction selection in ultra-high dimensional settings.(2)This paper proposes two new data-driven threshold rules.In order to realize the automation of the screening process and control the size of the final model by adaptively determining the cut-off value of the data,this paper proposes two new threshold rules based on kernel density estimation and pseudo variable generation respectively.These two threshold rules can control the false discovery rate and help improve the operability and interpretation of the screening method.In addition,the numerical results show that under various settings,the screening effect of the proposed algorithm is significantly better than the existing ultra-high dimensional interaction screening method.(3)The proposed algorithms are applied to two real gene expression datasets:rat microarray gene expression and human genotype tissue expression.The experimental results show that the algorithm of ISDC-T and ISDC-B based on threshold rule is effective in analyzing complex data sets,which provides a new method and idea for gene interaction selection.
Keywords/Search Tags:Ultra-high Dimensional Data, Interaction Effect, Distance Correlation, Model-Free, Sure Screening Property
PDF Full Text Request
Related items