Font Size: a A A

The Theoretical Studies Of Promoter Based On Multi-Features Fusion

Posted on:2012-11-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C ZuoFull Text:PDF
GTID:1100330335473039Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The transcription initiation is the most important regulatory region for the transcription expression. Elucidating the regulation mechanism of transcription initiation plays the crucial role in the genome functional annotation. The gene transcription initiation is duo to the core promoter recruiting a variety of regulatory elements. The promoter research is essential for studying the transcriptional regulation, the downstream target genes and the signal pathways. In addition, how the diversity of core promoters matching a variety of promoter recognition factors is the new problems in the RNA polymeraseⅡtranscription initiation.Different genes usually show different expression regulation mechanisms, so that the promoter patterns are more complex than previous understanding. GC properties are well-known global factors that influence promoter characteristics and gene expression. The position regulation of functional elements becomes obviously and broadly. It is known that the recognition accuracies of the promoter and the transcription start site (TSS) are far lower than the identification of protein-coding region. In this dissertation, the GC-Skew/Profile, positional conservation, DNA geometric flexibility and localization of regulatory elements were discussed for the four species promoters. And the multi-features fusion methods were also introduced to improve the predictive accuracy of promoter. The main results are as follows:First of all, based on the analysis of the nucleotides content, dinucleotides bias and GC positional component of different species promoter, the results showed that the promoter of human gene performs the unique GC bias. There are specific patterns of GC/AT-Skews and GC/AT-profiles in the promoter regions for different genomes. The positional conservation of human promoter near the TSSs is more significant than other regions. The Drosophila promoter exists an obvious conservation region at the 80bp upstream the TSS. There is no obvious position conservation region except for the localization region of TATA-box and the INR elements in the plant promoter region. By comparing the geometric structure of DNA flexibility, the distinctive flexibility and stiffness curvature indicated that the different promoter types usually show the specific structural patterns in the transcription regions. These hidden geometric deformation codes are helpful for the actual process of protein-DNA interaction during the transcription initiation.Secondly, by analyzing the localization of regulatory motifs around the transcription start sites, we found that the positional regulation of regulatory elements is consistent with the positional conservation. And there are the obvious differences of motif localizations for the four species. A new topical-40bp element, the GGAAG regulatory motif, was searched firstly in the human promoter region. And the TA-repeat element was also found at the -80bp region upstream the transcription start site of Drosophila gene. The biological annotations of GO ontology for these motifs were further discussed in our research.Thirdly, since positional regulation of regulatory elements is correlated with the tissue specificity of the downstream transcript, we mostly analyzed the distributions of TATA-box and TC-element for the four model species. The results showed that a number of Drosophila TATA-boxs are preferentially localized at -197bp, -195bp,184bp and-165bp positions. A large numbers of TC-elements were firstly searched in the absent TATA-box gene. TC-elements might constitute a class of novel regulatory elements participating towards the expression modulation of plant gene. The TC-element-containing genes were generally expressed in specific conditions. The TATA-box of eukaryotic promoter usually contains a highly rigid "AAAA" purine tail. It may be an important signal for guiding protein to locate the target site accurately. The TATA-box of prokaryotic promoter preferentially contains a highly rigid "GCGC" pyridine tail. By comparing the motifs localization for differentσtypes of E.coli promoter, it was found that the CTGGCA motif trended to locate at-24bp site ofσ54 promoter and the TG[CA]CGATAA motif preferred to theσ28 promoter of E.coli gene.Finally, the DNA flexibilities of promoter region, coding region and intergenic region for E.coli genome were analyzed. The results showed that the specific structural patterns in the promoter region, DNA geometric parameters originating from the three-dimensional structure, can describe the transcription initiation of the prokaryotic genome well. For the most critical problems of promoter recognition, feature extraction and algorithm improvement, we introduced a new parameter quadratic integration technology and a hybrid multi-feature approach combined with signal features, content features, and DNA flexibility features. And this method was applied to recognize TATA and TATA-less promoters of plant genome. The prediction evaluation achieved the best accuracy up to now. Based on the latest histone modification profiles, a new human promoter recognition algorithm was developed by integrating the epigenetic information and multi-features of DNA sequence. The good prediction results showed that the histone modifications are important for improving effectively the promoter recognition. In addition, the effect of the chromatin epigenetic feature, K-mer content and DNA structure on the predictive results were also discussed in this thesis.
Keywords/Search Tags:transcription regulation, gene promoter, regulatory motif, positional regulation, DNA geometric structure, epigenetic marks, parameter quadratic integration, support vector machine
PDF Full Text Request
Related items