Building Risk Prediction Model for Complex Genetic Disease Using High Dimensional Genetic Data

Posted on:2012-11-23

Degree:Ph.D

Type:Dissertation

University:Yale University

Candidate:Kang, Jia

Full Text:PDF

GTID:1454390008494837

Subject:Biology

Abstract/Summary:

PDF Full Text Request

An important topic in genetic studies of human diseases is the prediction of individual risk of succumbing to a particular disease. This knowledge can assist physicians in disease prevention, diagnosis, prognosis, and treatment. Traditional approaches to assessing patients' disease risk with a significant genetic component are primarily achieved through nongenetie risk factors and family history information, but the limitation of this approach in risk prediction is apparent as it is expected that a better prediction rule can be achieved if we can incorporate known genetic variations affecting disease risk in such modeling.;Recent advances in genome-wide association studies (GWAS) have led to the discoveries of hundreds of chromosomal regions associated with risk for dozens of diseases. One natural question following these successes is how to most effectively translate these exciting discoveries into better disease risk prediction models. However, risk prediction using GWAS data is a rather challenging task because for many common diseases, disease risk is jointly affected by many genes, nongenetic risk factors, and their interactions. In addition, genome wide association studies are often very underpowered, which makes accurate inference of genetic variants' effect size almost impossible.;This dissertation focuses on addressing the challenging task of performing risk prediction using high dimensional genetic data.;Chapter 1 discusses practical issues concerning the establishment of risk prediction models using high dimensional GWAS data, reviews popular methods that are widely adopted in the genetic risk prediction literature. Chapter 2 systematically investigates various factors that influence the performance of single SNP based risk prediction models through simulation studies and real data analysis. Chapter 3 expands the predictor set from single SNPs to multi-locus markers, and compares the performance of haplotype based risk prediction models to that of SNP based models.;Recognizing that feature selection is the bottle neck problem of risk prediction, in Chapter 4-6, we propose several models to integrate different types of genetic data and/or biological priors to facilitate better feature selections, all leading to an improved downstream risk prediction.

Keywords/Search Tags:

Prediction, Genetic, Disease, GWAS data, Studies

PDF Full Text Request

Related items

1	Clustering by genetic ancestry using genome-wide single nucleotide polymorphisms and incorporating genetic ancestry into genetic risk prediction models
2	Genetic Algorithm Based Composite Kernel Partial Least Square In Disease Prediction And Classification With Genomic Data
3	Statistical methods in genetic association studies
4	Genetic Association Studies of Alzheimer Disease Using Multi-Phenotype Tests and Gene-Based Test
5	Prediction Of Protein-coding Genes And Genetic Disease Relevant Genes
6	Molecular Genetic Studies Of Coronary Artery Disease In The Chinese Han Population
7	Research On GWAS Data Mining Of Alzheimer Disease Based On Protein-protein Interaction Network
8	Analysis On Risk Prediction Model For Complex Diseases And Data Mining On Genetic Variants
9	Statistical and Computational Methods on GWAS and post-GWAS Analysis to Identify Genetic Basis of Intracranial Aneurysm
10	Genetic Risk Prediction For Complex Traits With Genome-wide Data