Font Size: a A A

On Predicting Regulatory Motifs And Regulons In Prokaryotic Genomes

Posted on:2015-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:1220330467965987Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary which developed rapidly in recent years. It combines the knowledge of biology, mathematics, computer science and other areas for biological data analysis and research. Sequence analysis is a major part of bioinformatics, among which DNA sequence motif prediction is an important research area. The identification of transcriptional factor binding sites (TFBSs) has been one of the most widely studied flavors of the problem, both for its biological significance and for its bioinformatics hard-ness. The main topic we concern in this thesis is gene expression regulatory motif prediction and regulon elucidation in prokaryotic genomes.Genes need to express to proteins to perform their biological functions, and need to be regulated according to different inside and outside condi-tions. Expression regulation in prokaryotes is mainly achieved through the interaction between the RNA polymerase and regulatory proteins. Regula-tory proteins could recognize and bind specific DNA sequences in genome, and perform the regulatory function. These specific sequences are named transcriptional factor binding.sites (TFBSs). The binding sites for the same regulatory protein usually have same length, and are highly conserved in se-quence. This conserved sequence pattern is called a cis-regulatory motif. In prokaryotes, multiple continuous genes often form an operon to facilitate co-expression; single gene could be recognized as a special operon. The operon set that be regulated by same regulatory proteins is called a regulon. In this thesis, we first give a brief introduction to the model represents and prediction methods for regulatory motif. Based on the prediction method and the distribution pattern of TFBSs in the whole genome, we designed a method to measure the biological functional significance of the predicted motifs, which could be used to filter false positive predictions; using informa-tion content, we designed methods for comparison and clustering of motifs; with Hypergeometric distribution, we analysis the co-occurrence pattern of different motifs. This series of methods constitute a motif analysis toolkit BoBro2.0. Comparing with another popular software MEME, our forecast accuracy has been significantly improved, and we provide a number of unique motif analysis utilities. Corresponding software could be downloaded freely at http://code.google.com/p/bobro/.Through combining motif prediction and phylogenetic footprinting, we developed a new method for whole genomic regulon prediction. The phylo-genetic footprinting method allow us to discover putative regulatory motifs, however, lots of false positives are included. To overcome this problem, we designed a motif filtering method based on bipartite graph. As a result we get a score to reflect the co-regulation property between a pair of operons, the larger the score, the more possible they belong to the same one or multi-ple regulons. We leave out motifs that are not involved in such a high score operon pair. Then we construct a motif similarity graph, where nodes are motifs, edge weights are similarity scores between them; the whole graph in-dicates the similarity relationship among all putative motifs. We take known regulons as subset of the nodes, and analysis subgraphs induced by these nodes:found that those subgraphs have higher edge densities and cluster-ing scores than the whole graph, which shows that the graph reflects the regulon property. With this finding, we could design clustering methods to predict regulon. We compared our method with other two scores that reflect co-regulation, ours reflect co-regulation more accurately; since we used motif as node, our method avoid the difficulties introduced by regulon overlapping in regulon prediction by clustering. At last but not least, the whole process based solely one genomic sequence information, which makes our method especially useful for newly sequenced genomes.We developed an operon centric online database DOOR2.0, which inte-grated all the data involved in motif discovery and more. DOOR2.0contains genome-scale operons for2072prokaryotes with complete genomes, together with functional annotation and experimental confirmed TFBSs. DOOR2.0has a number of new features, compared with its previous version published in2009, including (ⅰ) more than250000transcription units, experimentally validated or computationally predicted based on RNA-seq data, providing a dynamic functional view of the underlying operons;(ⅱ) an integrated operon-centric data resource that provides not only operons for each cov-ered genome but also their functional and regulatory information, such as the cis-regulatory binding sites, promoter and terminator structures;(ⅲ) a high-performance web service for online operon prediction on user-provided genomic sequences;(ⅳ) an intuitive genome browser to support visualiza-tion of user-selected data; and (ⅴ) a keyword based Google-like search engine for finding the needed information intuitively and rapidly in this database. DOOR2.0, which is available online at http://csbl.bmb.uga.edu/DOOR/, will be updated on a regular basis. At last, we analyzed40Clostridium genomes systematically with comparative genome methods and motif anal-ysis tools, paying special attention to biomass degrading related genes and functions. Through this research, we both made biological useful findings and validated our motif analysis methods.
Keywords/Search Tags:Regulatory motif, Regulon Prediction, Algorithm design, Operon database, Clostridia
PDF Full Text Request
Related items