Font Size: a A A

Prediction Of Bacterial Transcriptional Terminators Based On Sequence

Posted on:2020-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:C Q FengFull Text:PDF
GTID:2370330596976643Subject:Engineering
Abstract/Summary:PDF Full Text Request
Transcription termination is an important regulatory step in gene expression,which is determined by the terminator.If there is no terminator in gene,transcription cannot be stopped,resulting in abnormal gene expression.Detecting such terminator in bacteria could not only determine the operon structure in bacterial organisms,but also improve the annotation of genome.Thus,accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations.Although the biochemical experimental approaches can identify terminators clearly and accurately,the wet-experimental techniques are really time-consuming and expensive.Therefore,some calculation methods have been proposed,which are mainly divided into two categories:(1)one is to use nucleic acid composition information to describe terminators.(2)the other is to utilize the hairpin structure feature and the downstream T-rich region as features to describe the terminator.Since these methods cannot reflect the statistical characteristics of the terminator,we proposed using machine-learning to identify bacterial terminators based on sequence information.In the thesis,we constructed two models named "iTerm-PseKNC" and "DeepTerm" for identifying bacterial transcription terminators based on a low redundancy dataset.(1)“iTerm-PseKNC” was developed based on Support Vector Machine(SVM).And the binomial distribution approach was used to pick out the optimal feature subset derived from pseudo the K-tuple nucleotide composition(PseKNC).The five-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%.(2)“DeepTerm” is a convolutional neural network based model for terminator prediction.We used the one-hot encoded sequences as the input.The five-fold cross-validated results showed that “DeepTerm” achieved an accuracy of 99.40%.To further evaluate the generalization ability of "iTerm-PseKNC" and "DeepTerm",two models were examined on independent datasets which have been experimentally confirmed Rho-independent terminators in Escherichia coli and in Bacillus subtilis genome.As a result,all the terminators in Escherichia coli were correctly identified by “iTerm-PseKNC” and “DeepTerm”.87.5% and 99.24% of the terminators in Bacillus subtilis were correctly identified by “iTerm-PseKNC” and “DeepTerm”,respectively.The results suggest that the proposed models can be powerful tools for bacterial terminator recognition.For the convenience of most of wet-experimental researchers,the web-server for “iTerm-PseKNC” was established at http://lin-group.cn/server/ iTerm-PseKNC/,by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.
Keywords/Search Tags:bacterial terminator, K-tuple nucleotide composition, support vector machine, convolutional neural network
PDF Full Text Request
Related items