Font Size: a A A

Theoretical Prediction Of Promoters Based On Physical And Chemical Characteristics Of The Sequence

Posted on:2016-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:E Z DengFull Text:PDF
GTID:2180330473959553Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Promoters are necessary regulation elements that determine the time and level of gene expression. They play important roles in gene expression, metabolism regulation, expression system construction and so on. With the avalanche of genome sequences generated in post-genome age, how to develop a quickly and effectively method to identify the promoter is the key research content of epigenetics. At present, most of computational methods usually have some limitations, such as not considering DNA physical structure characteristics, not taking into account the global correlation, and not performing the feature selection. Based on these considerations, this thesis developed a new prediction method to identify the promoters.This thesis mainly focused on the bioinformatics research on prokaryotic sigma-54 promoters. Based on a new feature extraction method, the classification accuracy was dramatically improved. According to this method, an online free webserver called i Pro54-Pse KNC was developed for the available of researchers.At first, we used a new feature extraction method, called ‘pseudo k-tuple nucleotide composition’, which characterized by six DNA local structural properties of base pairs, rather than the frequency of k-tuple nucleotides in DNA sequences. Subsequently, the support vector machine was used to discriminate between sigma-54 promoters and non-promoters. Jackknife cross-validated result shows that high accuracy was obtained in the prediction of sigma-54 promoters. In order to verify the superiority and the performance of the method, some experimental-confirmed sigma-54 promoters that are independent from training data were collected and predicted by the method. Results show that the proposed method can recognize the sigma-54 promoters, demonstrating the good performance of this method. For the convenience of the majority of experimental scientists, a web-server was constructed and can be freely accessible at http://lin.uestc.edu.cn/server/i Pro54-Pse KNC.For the purpose of better understanding the biological mechanism of the upstream sequences in the translation initiate site, the distribution of the distance between the transcription start site and translation initiation site was investigated. The statistical and mathematical deduction demonstrates that the distribution obeys the gamma distribution and shared the similar properties with other phenomena in life science, revealing the mysterious veil of distance distribution in life.Finally, for the convenience of other scientists to use the sigma-54 promoter data, a database of sigma-54 promoters named Pro54 DB was built. At present, the Pro54 DB has stored experimental-confirmed sigma-54 promoter sequences and relevant information including regulation of genes, function of the product and source organism. In addition, the database also provides keyword search, BLAST sequence alignment, prediction, statistics and other related function. The basic requirements are enough for most of researchers.
Keywords/Search Tags:promoter prediction, pseudo nucleotide composition, feature selection, database, online service
PDF Full Text Request
Related items