Font Size: a A A

Prediction of transcription factor binding sites using information from multiple species

Posted on:2011-08-28Degree:Ph.DType:Thesis
University:University of Colorado Health Sciences CenterCandidate:Siewert, Elizabeth AllanFull Text:PDF
GTID:2440390002957740Subject:Biology
Abstract/Summary:
De novo identification of transcription factor binding sites (TFBS) is a challenging computational problem because TFBS are relatively short sequences buried in long genomic regions. Earlier methods incorporated genome-wide expression data and promoter sequences into a linear-model framework, regressing expression on counts of putative TFBSs in promoters for a single species. More recently, it has been shown that including sequence data from multiple species improves the predictive ability of this regression model.In this thesis, we describe two extensions of this single-species, linear-model framework. These algorithms extend the search space to both sequence and expression information from all available genes across multiple species. Our first model uses a repeated-measures approach where we treat the gene-expression measurements across species as repeated measurements across evolutionary time. This model imposes the phylogenetic relationships among species on the error covariance structure. Our second model uses a Bayesian hierarchical approach, where we impose the phylogenetic relationships among the species on the prior distributions of the regression coefficients. For each model, we also consider (1) retaining all covariates in the model in a forward selection manner or (2) calculating and using the residual expression measures for each subsequent regression.These multiple-species algorithms were developed using a data set of four yeast species grown under heat-shock conditions and comparisons are made first to the single-species algorithm, and secondly to each other. Using evaluations based on the information content of the predicted motifs, and comparisons to two independent data sets, we find that all multiple-species results show an improvement in the prediction of TFBS over the single species algorithm.
Keywords/Search Tags:Species, TFBS, Multiple, Using, Information
Related items