Font Size: a A A

Building Risk Prediction Model for Complex Genetic Disease Using High Dimensional Genetic Data

Posted on:2012-11-23Degree:Ph.DType:Dissertation
University:Yale UniversityCandidate:Kang, JiaFull Text:PDF
GTID:1454390008494837Subject:Biology
Abstract/Summary:PDF Full Text Request
An important topic in genetic studies of human diseases is the prediction of individual risk of succumbing to a particular disease. This knowledge can assist physicians in disease prevention, diagnosis, prognosis, and treatment. Traditional approaches to assessing patients' disease risk with a significant genetic component are primarily achieved through nongenetie risk factors and family history information, but the limitation of this approach in risk prediction is apparent as it is expected that a better prediction rule can be achieved if we can incorporate known genetic variations affecting disease risk in such modeling.;Recent advances in genome-wide association studies (GWAS) have led to the discoveries of hundreds of chromosomal regions associated with risk for dozens of diseases. One natural question following these successes is how to most effectively translate these exciting discoveries into better disease risk prediction models. However, risk prediction using GWAS data is a rather challenging task because for many common diseases, disease risk is jointly affected by many genes, nongenetic risk factors, and their interactions. In addition, genome wide association studies are often very underpowered, which makes accurate inference of genetic variants' effect size almost impossible.;This dissertation focuses on addressing the challenging task of performing risk prediction using high dimensional genetic data.;Chapter 1 discusses practical issues concerning the establishment of risk prediction models using high dimensional GWAS data, reviews popular methods that are widely adopted in the genetic risk prediction literature. Chapter 2 systematically investigates various factors that influence the performance of single SNP based risk prediction models through simulation studies and real data analysis. Chapter 3 expands the predictor set from single SNPs to multi-locus markers, and compares the performance of haplotype based risk prediction models to that of SNP based models.;Recognizing that feature selection is the bottle neck problem of risk prediction, in Chapter 4-6, we propose several models to integrate different types of genetic data and/or biological priors to facilitate better feature selections, all leading to an improved downstream risk prediction.
Keywords/Search Tags:Prediction, Genetic, Disease, GWAS data, Studies
PDF Full Text Request
Related items