Font Size: a A A

Statistical methods for sub-class discovery on genomic structures with quantitative outcomes

Posted on:2017-02-17Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Ye, ShuyunFull Text:PDF
GTID:1444390005469378Subject:Statistics
Abstract/Summary:PDF Full Text Request
Technological advances have enabled researchers in genomic studies to collect various forms of data, such as genotype, phenotype, and clinical information on subjects of interest. Researchers have also been long aware of the fact that there are sub-classes in most populations that have different genetic structures giving rise to the same outcome. However, statistical methods to identify possible sub-classes with different predictors for important outcomes in the subjects under study is still limited. In this dissertation, I develop, evaluate, implement and apply statistical and computational methods to discover possible latent classes in a population, where each class has a distinct genetic structure affecting important outcomes of the individuals, such as survival time, and important quantitative traits.;The first survival-supervised latent Dirichlet allocation (survLDA) modeling framework extends a classic information retrieval (IR) algorithm to genomic based studies of disease. LDA models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. An application of survLDA to The Cancer Genome Atlas (TCGA) ovarian project identifies informative patient sub-classes showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.;The second framework is a latent class quantitative trait loci mapping (lcQTL) Method. The majority of these methods assume a single model common to each subject and consequently sacrifice power and accuracy when genetically distinct sub-classes exist. To address this, we have developed lcQTL to enable latent class QTL mapping. The approach combines latent class regression with stepwise variable selection and traditional QTL mapping to estimate the number of sub-classes in a population, and to identify the genetic model that best describes each subclass. Application of the method to case studies of obesity and diabetes in mouse gives insight into the genetic basis of related complex traits.
Keywords/Search Tags:Genomic, Methods, Class, Studies, Statistical, Quantitative, Genetic
PDF Full Text Request
Related items