Font Size: a A A

Elucidation And Identification Of Regulons For Prokaryotic Genomes

Posted on:2011-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q MaFull Text:PDF
GTID:1100330332481360Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Bioinformatics is an important interdisciplinary field of biology, computer science, mathematics, physics and other sciences. In the last decade, the rapid development of Bioinformatics dramatically improved the research in biology, and also provided a lot of challenge problems for other fields. The main topic we concerned in this thesis is elucidation and identification of regulons for Prokaryotes through combinatorial algorithms.The basic transcriptional units in Prokaryotic genomes are operons (instead of genes), which are then organized into a higher-level organization, called regulons, constituting the basic units in the global transcription regulation network in a bacterial cell. It is essential to have a capability to elucidate all the regulons in order to understand and to study the global transcription regulation network encoded in a bacterial genome. As we know, each regulon becomes visible to experimental studies only under the right conditions, and trying all possible conditions in order to expose each of the hundreds or possibly more regulons encoded in a bacterial genome is clearly unrealistic. Thus, we propose to develop a novel computational framework for elucidation of regulons in this thesis, aiming to find the vast majority of the regulons encoded in a Prokaryotic genome. This new framework will build on a number of recent advancements that we have made in terms of having developed new capabilities for (a) accurate prediction of cis regulatory motifs in extremely complicated data environment, (b) a powerful Biclustering analysis tool for gene expression data and (c) accurate prediction of orthologous genes by comparing genomes. Firstly, we present new software, BOBRO, for prediction of cis regulatory motifs in a given set of promoter sequences. The algorithm substantially improves the state-of-the-art in motif finding based on two key ideas:(1) we developed a highly effective method for accurate identification of each conserved motif by finding maximal cliques in a graph defined over sequence positions with high possibilities being the starts of conserved motifs; and (2) we developed a highly reliable way for recognition of actual motifs from the identified cliques based on a new concept of motif closure. We have compared the prediction performance of BOBRO with five popular prediction programs on large-scale data sets in a systematic manner, and found that BOBRO is at least 42% more accurate than the best performing program of the five across all the test datasets. Especially, our genome-scale application of BOBRO, on E. coli K12 identified 1,472 experimentally confirmed cis regulatory motifs which is a foundation of regulon prediction.Secondly, we report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative measures of gene expression data and combinatorial optimization techniques. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called scaling patterns, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a much improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and additional data sets.Thirdly, we developed a novel algorithm, GOST, to determine the orthologous relationships between related prokaryotic genomes. The key ideas making it distinguished from others are in (1) integrating evolutionary information about operon structures into the procedure; and (2) modeling it as a global optimization problem whose solution does grasp the orthology among the massive homologous relationships between the two genomes under consideration. We compared the GOST with other three popular orthology detection programs based on the coverage and error rate metrics on a large set of prokaryotic genomes and found that GOST consistently outperforms the best among the three programs by a substantial margin. 77% of those gene pairs which are detected to be orthologous by GOST and modified from detected orthologous relationship by RBH are further confirmed to be orthologous through employing other three biologically related means. The GOST developed here is so computationally efficient that it is able to find orthologous relationship between two genomes within a few minutes.At last, we solve the regulon-prediction problem based on the predicted orthologous genes and cis motifs with a combinatorial optimization algorithm, REGUP, aiming to find all the regulons encoded in a Prokaryotic genome. Currently there has not been any published computer program attempting to solve the problem at a genome scale. We then validate and refine our methods based on the 178 known E. coli regulons in RegulonDB and microarray gene expression data collected on E. coli under 466 conditions in M3D. After fully tested, all the developed tools in this thesis will be made publicly available, along with the predicted regulons.
Keywords/Search Tags:Bioinformatics, Prokaryotic genome, Combinatorial algorithm, Transcription factor binding sites, Motif, Gene expression data, Orthologous genes, Operon, Regulon
PDF Full Text Request
Related items