Font Size: a A A

Encore: A computational framework for the integrative analysis of genome-wide association study data and other biological data

Posted on:2013-04-27Degree:Ph.DType:Dissertation
University:The University of TulsaCandidate:Davis, Nicholas AFull Text:PDF
GTID:1453390008469602Subject:Biology
Abstract/Summary:
Analysis of Genome-wide Association Study (GWAS) data has led to several interesting discoveries relevant to human disease. Some key findings have been uncovered via GWAS in high-profile diseases, including Type 2 diabetes, Coronary disease, and even mental illnesses such as Bipolar disorder. However, the number of risk genes has fallen short of expectations, leading many to search for the missing risk in rare genetic variants and more complex gene-gene interaction models. Since a typical GWAS can include millions of single-nucleotide polymorphisms (SNPs), a crucial factor in performing an effective and rapid interaction analysis lies in the algorithms and software tools employed. Interactions with other biological data, such as brain volumetric and fMRI time series, may reveal additional risk factors involved in particular diseases. Combining GWAS with these other domains of data offers a powerful approach to an integrative analysis of disease but poses many challenges.;In previous work, a number of methods have proven useful in gene-gene interaction analysis: 1. Genetic Association Interaction Network (GAIN) uses information theory to determine the influence of SNPs in pairwise interactions, encoding interactions and main effects in a network, 2. SNPrank eigenvector centrality algorithm computes the network importance of SNPs present in a GAIN and ranks these features by their relevance to a given phenotype, and 3. Evaporative Cooling (EC) is a machine learning method that distills a genome-wide number of SNPs to a feasible number for pairwise interaction analysis. EC balances main effects and interactions by incorporating machine learning methods, ReliefF and Random Forests.;My bioinformatics knowledge framework, Encore, unifies all of these methods with PLINK, third-party software that is a ubiquitous standard in GWAS analysis. In addition, Encore extends the original information theoretic GAIN method to include regression-based network inference in a new approach called reGAIN. This extension allows Encore networks to correct for covariates and add other domains of quantitative data with GWAS data to create an integrative network model of complex diseases. Encore and its dependencies are released as open source tools, to allow others to view the underlying basis of the software and contribute new features and improvements. This suite of tools should facilitate the identification of novel factors relevant to common, complex human phenotypes.
Keywords/Search Tags:Data, GWAS, Genome-wide, Association, Encore, Integrative
Related items