Font Size: a A A

Prediction Of Eukaryotic Gene Transcriptional Regulation Elements And Networks

Posted on:2012-06-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1260330425482883Subject:Biology
Abstract/Summary:PDF Full Text Request
Transcriptional regulation is the key step in gene regulation, which is achieved via the binding of transcription factors (TF) to the specific regulatory sequence called cis-element upstream of genes. Yet, the algorithm for identifying the c/s-elements is far from satisfied, which has resulted in a big problem in the computational analysis of gene transcriptional regulation and the inferring of networks based on genome data and various kinds of high-throughput data. Aimed at obtaining the eukaryotic gene transcriptional regulation relations or network, we undertook a systematic study of cis-element, including the over-representation property and the statistic model for measuring this property. We also developed a new tool for the computational prediction of the motifs of cis-element from orthologous data and gene co-expression data. The new tool is a combination of the statistic model and a new mutation degree model proposed in this study. And this tool was specifically designed to adapt to the practical situations of a shortage of orthologous data. Meanwhile, we constructed the human global transcriptional regulation network based on the searching of known cis-elements through the conserved promoter regions of the whole genome. This network will serve as a reference network for the construction and analysis of any specific networks of various sizes. Following is a summary of these studies.1. Over-represented k-mers in non-coding genomic regions often lead to identification of potential transcriptional regulatory sites (TRS) or motifs of cis-elements. This property has been employed by many algorithms to predict regulatory motifs in silica. Yet, the improvement of these algorithms should be based on deeper understanding of the enrichment feature. To obtain a general distributional profile of TRS in different regions of genomes as well as in different genomes, we here performed a systematic analysis on the over-representation of TRS in intergenic regions and gene upstream regions of yeasts and viral genomes. We explored the way to evaluate the accuracy of TRS consensus sequences by measuring their enrichment.To measure enrichment, a statistical background model was introduced by comparing TRS frequency in certain regions of genome to either the frequency in the whole genome or the frequency in exon region. This model was applied to different classes of non-coding genomic regions in four genomes. Most of the TRS were observed to be overrepresented in the intergenic regions of the Saccharomyces cerevisiae, Schizosaccharomyces pombe and Epstein-Barr virus (EBV) genomes. The enrichment of S.cerevisiae TRS in the600bp upstream region of genes was also significant. In Drosophila genome, TRS did not show enrichment in intergenic and intron regions when TRS frequency in the whole genome was taken as background, as we did in other genomes. However, when we took TRS frequency in exon region as background, over70%TRS are over-represented in those two classes of non-coding regions. This fact indicates the existence of transcriptional regulatory signals in introns. The analysis of some S.cerevisiae TRS, which have inconsistent consensus sequences with different levels of enrichment in intergenic region, suggests the possibility of evaluating the accuracy of experimentally determined TRS by measuring their enrichment in non-coding genomic regions.2. A new prediction method is proposed which directly identify the motifs of transcriptional regulation element from the orthologous promoters based on a series of features including sequence conservation and over-representation. In some cases this kind of methods are not easy to apply:e.g. for plants it is difficult to collect a reliable set of orthologous genes. To overcome this problem a new mutation degree model is proposed and a new tool called OCW is developed by combining this new model with the over-representation property of functional element in co-expressed gene sets which is measured by the Fish’s exact test in this step.The proposed method has been well tested on various types of data and against several other tools:i) The test on synthetic data shows good results on terms of positive predictive value and specificity, while the weak point of method seems to be the low sensitivity; ii) The tool is further tested on biological data consisting in7sets of co-expressed genes from Arabidopsis for which the motifs have been published:the proposed method reached better results in respect with other six tools; iii) The test on noisy data shows that the method has a greater tolerance to unreliable phylogenetical genes compared with other two tools.As the new method adopts a strategy that doesn’t use the phylogenetic tree of orthologous promoters, it is more applicable to various approaches of transcriptional regulation analysis in which the data quality could not be ensured.3. It still remains a big challenge in the field of systems biology to obtain a global gene transcriptional regulation network in the scale of the whole genome. Based on the hypothesis that the functional elements and the transcriptional regulation relations are conserved, we predicted the global transcriptional network for human by searching for matches of known cis-elements throughout the conserved promoter regions of protein coding genes and microRNA genes in human genome. We also included the transcriptional regulation of microRNAs in the global network by predicting the targets on the3’UTR of coding genes. Furthermore, a series of tissue networks were predicted using the gene expression data. The topological structure and network feature were analyzed and a comparison of tissue networks and the global network were made.
Keywords/Search Tags:transcriptional regulation, cis-element, transcription factor binding site(TFBS), motif prediction, transcriptional regulation network, over-representation, evolutionary conservation
PDF Full Text Request
Related items