Font Size: a A A

Sequence analysis methods for the detection of promoters and transcription factor binding sites

Posted on:2007-05-21Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Naughton, Brian ThomasFull Text:PDF
GTID:2450390005487658Subject:Biology
Abstract/Summary:
The detection of promoters, and their associated transcription factor binding sites (DNA motifs) is an increasingly important biological problem. Knowledge of the location of every regulatory sequence in an organism would bring us one step closer to a computational model of the cell.As our understanding of these binding sites becomes more sophisticated, the computational models used in their analysis must also advance. Accordingly, there is also a need to incorporate increasing amounts of disparate data into these models. In this thesis we investigate the use of graphs to represent and detect transcription factor binding sites we further augment these motif-finding algorithms by applying machine-learning techniques to incorporate heterogeneous data. In ancillary experiments, we apply aspects of this work to detecting promoters in the E. coli genome.We developed the MotifCut algorithm, a novel ab initio motif-finding algorithm. This method uses a graph-based representation of DNA sequence, and methods from fractional programming to deterministically find the set of segments in a DNA sequence that are the most similar. This group of highly similar DNA segments are the most likely constituents of a motif in the sequence.The MotifScan algorithm uses the same graphical representation as MotifCut to detect new examples of known binding sites. This algorithm detects new binding sites by comparison to clusters of k-mers in the original motif graph. The MotifScan algorithm was further extended by the addition of classification algorithms (specifically, Support Vector Machines (SVMs)) applied to external data. This allows us to add some context to a putative binding site. For example, binding sites often function together as modules of regulation. Therefore, the location of other binding sites nearby can help us determine whether a binding site is real or not.The methods developed to augment MotifScan were applied to the problem of promoter detection in E. coli. In this project we investigated the use of SVMs to combine data from a number of heterogeneous data sources, including inferred DNA structure, to improve our ability to detect bacterial promoters. In the process we can learn something about the signals that RNA Polymerase itself responds to.
Keywords/Search Tags:Binding sites, Transcription factor binding, Promoters, Detect, DNA, Sequence, Methods
Related items