Inference and Prediction for High Dimensional Data via Penalized Regression and Kernel Machine Methods

Posted on:2013-03-12

Degree:Ph.D

Type:Dissertation

University:Harvard University

Candidate:Minnier, Jessica Nicole

Full Text:PDF

GTID:1450390008463389

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Analysis of high dimensional data often seeks to identify a subset of important features and assess their effects on the outcome. Furthermore, the ultimate goal is often to build a prediction model with these features that accurately assesses risk for future subjects. Such statistical challenges arise in the study of genetic associations with health outcomes. However, accurate inference and prediction with genetic information remains challenging, in part due to the complexity in the genetic architecture of human health and disease.;A valuable approach for improving prediction models with a large number of potential predictors is to build a parsimonious model that includes only important variables. Regularized regression methods are useful, though often pose challenges for inference due to nonstandard limiting distributions or finite sample distributions that are difficult to approximate. In Chapter 1 we propose and theoretically justify a perturbation-resampling method to derive confidence regions and covariance estimates for marker effects estimated from regularized procedures with a general class of objective functions and concave penalties. Our methods outperform their asymptotic-based counterparts, even when effects are estimated as zero.;In Chapters 2 and 3 we focus on genetic risk prediction. The difficulty in accurate risk assessment with genetic studies can in part be attributed to several potential obstacles: sparsity in marker effects, a large number of weak signals, and non-linear effects. Single marker analyses often lack power to select informative markers and typically do not account for non-linearity. One approach to gain predictive power and efficiency is to group markers based on biological knowledge such genetic pathways or gene structure. In Chapter 2 we propose and theoretically justify a multi-stage method for risk assessment that imposes a naive bayes kernel machine (KM) model to estimate gene-set specific risk models, and then aggregates information across all gene-sets by adaptively estimating gene-set weights via a regularization procedure. In Chapter 3 we extend these methods to meta-analyses by introducing sampling-based weights in the KM model. This permits building risk prediction models with multiple studies that have heterogeneous sampling schemes.

Keywords/Search Tags:

Prediction, Risk, Effects, Inference, Methods, Model

PDF Full Text Request

Related items

1	Screening And Risk Prediction Model Establishment Of Hub Genes In Prostate Cancer Based On Bioinformatic Methods
2	Statistical methods to adjust for measurement error in risk prediction models and observational studies
3	Exact Selective Inference Based On Several Penalty Methods
4	Bayesian Inference For The Joint Model Of Non-ignorable Missing And Skewed Distribution
5	Application Of Martingale In Risk Model
6	Mixed Effects Model Of Statistical Inference And Application
7	Parametric bootstrap interval approach to inference for fixed effects in the mixed linear model
8	Exploration Of Risk Factors And Construction Of Prediction Models For Cervical Cancer In High Risk HPV Positive Women
9	Study Of Lightning Risk And Prediction Methods Of Yangkou Port
10	Research On Risk Prediction Methods Of Large-scale Unbalanced Guarantee Networks