Font Size: a A A

Construction And Analysis Of Knockoff Based Variable Selection Algorithm

Posted on:2023-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhaoFull Text:PDF
GTID:2530306842971839Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Before modeling the relationship between variables and responses,we often collect as many variables as possible,which may result in the difficulty of computation and the curse of dimensionality.Variable selection can locate the relevant variables to improve the model’s computation feasibility and interoperability,which has attracted much attention in the statistics and machine learning literatures.The existing variable selection methods can be roughly divided into three categories: the sparse regularized method,the method associated with multiple hypothesis testing and the Knockoff inference method.Due to the theoretical guarantees on controlling the false discovery rate(FDR),the Knockoff inference has been successfully applied to biology and astronomy data analysis.Knockoff inference mainly include three parts: generating knockoff variables,regression estimation,the computation of variable importance and threshold truncation.However,the previous Knockoff inference usually depends heavily on the coefficient-based variable importance and only concerns the control of FDR.This paper goes beyond these restrictions and proposes an error-based knockoff inference method by integrating the knockoff features,the error-based feature importance metrics and the Stepdown procedure together.The proposed method does not require specifying a regression model and can handle feature selection with theoretical guarantees on controlling FDR,false discovery proportion(FDP)or k-familywise error rate(k-FWER).In theory,this work establishes theoretical analysis of the proposed approach on the ability of controlled variable selection,power and robustness.In applications,its effectiveness for controlled variable selection and regression estimation are validated by experimental evaluations on simulation data and HIV drug resistance data.
Keywords/Search Tags:Variable Selection, False Discovery Rate(FDR), False Discovery Proportion(FDP), Knockoff, Multiple Hypothesis Testing
PDF Full Text Request
Related items