Font Size: a A A

A novel statistical approach for sigma 28 promoter prediction in eubacteria

Posted on:2010-08-05Degree:Ph.DType:Thesis
University:The Johns Hopkins UniversityCandidate:Song, WenjieFull Text:PDF
GTID:2449390002470751Subject:Biology
Abstract/Summary:
Alongside the availability of substantial amounts of microbial genome sequence data, computational approaches for analyzing genetic codes have evolved tremendously during the last two decades. Promoter prediction studies, one area of specific interest to computational biologists, can help the experimentalists move towards a clearer understanding of gene regulation. However, this problem has not proven easy to resolve, both because promoter sequences do not always possess specific defined characters, and because the number of experimentally identified promoter sequences is very limited. Consequently, it has proven difficult to develop a prediction algorithm which does not generate a significant amount of false positive data while allowing various features that may, or may not, be present in known promoter sequences.;The aim of this thesis was to generate a novel multi-step algorithm for sigma 28 promoter prediction on a genomic scale which could be applied to bacteria that do not have experimentally identified promoter sequences. Pattern matching was first used to identify putative promoter sequences upstream of motility and chemotaxis genes because sigma28 has a well-documented role in motility. A position specific score matrix was generated from these promoters then used to identify promoters on a genome-wide scale. An iterative step was incorporated to allow for species-specific promoter variation.;This approach was first applied to predict sigma28 promoters in several gamma-Proteobacteria. Predicted promoters were validated by cross-species comparison, or using transcriptional profiling. Several potential novel motility and chemotaxis genes were discovered. However, our analysis showed that the algorithm is not appropriate for sigma28 promoter prediction in certain non-gamma-Proteobacteria. This was particularly the case when the preliminary PM approach predicted only a few promoters, or if the amino acid sequence of the test species sigma28 matched E. coli sigma28 poorly. However, sigma 28 promoter prediction results in non-gamma-Proteobacteria were useful for identifying various relationships between sigma28 and the anti sigma28 factor, FlgM. This suggested that motility and chemotaxis regulation systems are quite diverse in the Eubacteria. The results of this study will be of particular interest to researchers studying bacterial motility and chemotaxis, and will also benefit those using systems approaches to study bacterial physiology. In the future, the multi-step algorithm can be modified to predict other types of statistically underrepresented DNA sequences.
Keywords/Search Tags:Promoter, Approach, Sequences, Novel, Sigma, Algorithm
Related items