Font Size: a A A

Elastic Correlation Adjusted Regression(ECAR) Score For High Dimensional Variable Importance Measuring

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2517306503986929Subject:Statistics
Abstract/Summary:PDF Full Text Request
Investigation of genetic basis of traits or clinical outcome heavily relies on identification of relevant variables in molecular data.However,characteristics such as high dimensionality and complex correlation structures of these data impair the realization of full potential of this process,resulting in the inclusion of false positives and negatives.Here we developed a variable importance measure method,termed ECAR score,that evaluates the importance of variables in the dataset.Based on this score,ranking and selection of variables can be achieved simultaneously.Different with most current approaches,ECAR score aims to rank the influential variables as high as possible while maintaining the grouping property,instead of selecting the ones that are merely predictive.The performance of ECAR score is tested and compared to other methods on simulated,semisynthetic and real datasets.Results showed that ECAR score produce variable importance measures that improve CAR score and outperform other classic methods such as Lasso,Stability selection in terms of accuracy of variable selection and predictive power of high-rank variables,when there is high degree of correlation among influential variables.As an application,we used ECAR score to analyze genes that associated with forced FEV1(Forced Expiratory Volume in the first second)in patients with lung cancer,and 6 associated genes were reported.
Keywords/Search Tags:High-dimensional, Variable importance, Gene selection, CAR score
PDF Full Text Request
Related items