Font Size: a A A

Based On The Information Of Sequences To Predict The Transcription Factor Binding Sites And Promoter

Posted on:2008-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:K L YangFull Text:PDF
GTID:2120360215991357Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The gene transcription regulation is a central challenge of bioinformatics; animportant step in this challenge is the ability to identify transcription factorbinding sites and promoter. Based on the known transcription factor binding sitesand promoter sequences, a new position weight matrices scoring algorithm(PWMSA) for predicting transcription factor binding sites is presented. In addition,the support vector machine (SVM) model combined with increment of diversity isused to predict promoters.Based on the difference of nucleotide probability in every position oftranscription factor binding sites, the sites conservation indexes M_i are calculated.A novel position weight matrices scoring algorithm (PWMSA) for predictingtranscription factor binding sites is presented. Transcription factor binding sites(TFBS) can be predicted by using of sites conservation indexes and the positionweight matrices (PWM).At first, the TFBS for 22 kinds of transcription factor in Ecoli-k12 genome arepredicted by using PWMSA. By using of the self-consistency test and the 10-foldcross-validation test, the results show that the overall prediction accuracies are87.59% and 86.45%, respectively.After that, the TFBS for 9 kinds of transcription factor in Saccharomycescerevisiae genome are also predicted by using PWMSA. The results inself-consistency test and the 10-fold cross-validation test show that the overallprediction accuracies are 83.14% and 77.51%, respectively. By comparing ouralgorithm with other ten softwares using the same performance measures andbenchmarked database, the results show that the overall prediction accuracies ofPWMSA are 4% and 7% more than the other ten algorithms, respectively, at binding sites segment level and nucleotide level.The third, by considering the interdependent effects between bases intranscription factor binding sites sequences, the pairwise nucleotide dependentPWM and the Pre-conservative index vectors are incorporated in PWMSA. Theresults of prediction for 9 kinds of transcription factors in Saccharomycescerevisiae genome are further improved. The results show that the overallprediction accuracies are 88.04% and 81.10%, by using of self-consistency test andthe 10-fold cross-validation test, respectively. Finally, these results are discussed.In last part, based on the six least increment diversity, three kinds of positionweight matrix and the percent of GC in the sequences, the content vectors and thesignals vector were extracted from the promoter sequences. These vectors wereinput into a support vector machine (SVM) algorithm to establish a promoterclassification model. The human Polâ…¡promoter sequences are predicted by usingof support vector machine in the 10-fold cross-validation and the independent datatest. The results show that the overall prediction accuracies (sensitivity) andspecificity are more than 88%. In order to compare our results with other algorithm,our algorithm is applied to same dataset of promoter as the other methods. Thesensitivity of 97.00% and the specificity of 97.89% are obtained, it is better thanother top softwares currently published.
Keywords/Search Tags:transcription factor binding sites, promoter, position weight matrices, measure of diversity, support vector machine
PDF Full Text Request
Related items