Font Size: a A A

Prediction And Analysis Of Promoter Sequences Based On BP Neural Network

Posted on:2021-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z R GuoFull Text:PDF
GTID:2480306107481884Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The regulation of gene transcription is the most important activity in gene expression regulation.The DNA sequence closely related to gene transcription is called promoter,which is an important cis-element in gene expression functional sequences.Therefore,the study of promoter is of great significance for revealing the transcriptional units of specific pathways,understanding the mechanism of gene regulation,and exploring the structure of genes.It is also the research basis for the genome annotation.Initially,the identification of promoter relied on traditional biological experiments,but traditional biological experiments were costly,time-consuming,and labor-intensive.In recent decades,massive biological data foundations and developent of computational technology have promoted the promoter prediction research based on computer algorithms.Multiple types of promoter features have been used for experimental analysis,and multiple computational recognition models of promoters have been proposed one after another.But most of the current prediction methods have limitations.The study introduces new methods for describing sequence features,which is expected to improve the prediction performance of the model.Besides attention the prediction accuracy of the model,should also pay attention to its generalization ability.If you want to improve the universality of the model,you need take more promoter sequences as analysis objects,analyze sequence characteristics comprehensively to obtain more general or more comprehensive sequence information as the basis for classification.Therefore,this paper took promoter and non-promoter sequences of three prokaryotes and two eukaryotes as subjects.Based on the original reaserch,this paper introduced Information theory and other signal processing methods to analyze the promoter sequence and obtain the sequence features.The integration of multiple types of features leads to redundant information in the characteristic space.In order to eliminate redundant information,this paper adopted recursive feature elimination(RFE)algorithm to implement feature selection.Corresponding promoter classification models for the above five species were constructed by employing BP neural network.Finally,the classification performance of the constructed classifier was evaluated by 5-fold cross-validation method.The experimental results on the prokaryotic benchmark test set as following: the 5-fold cross-validation results showed that the accuracies and AUC values for E.coli,B.subtilis and P.aeruginosa were 0.755 and 0.814,0.831 and 0.903,0.788 and 0.916 respectively.In addition,we conducted cross-species experiment to test the generalization ability of the model.The accuracy and AUC for the three species of prokaryotes were0.819 and 0.804.The experimental results on the eukaryotic benchmark test set as following: the accuracy and AUC value for Human were 0.864 and 0.940.The accuracy and AUC value for Mouse are 0.782 and 0.825.The accuracy and AUC for the two eukaryotes were 0.823 and 0.884.The results indicated that the feature anapyzed obtained based on information theory and other methods in this paper are effective for distinguish promoters from non-promoters,and the model has good generalization ability,which contributes to revealing the more universal characteristics of promoters,and developing a more robust prediction model for the promoter.It also promots the cross-species prediction analysis and extends application of promoter classification model.
Keywords/Search Tags:Sub-sequence correlation, sequence entropy analysis, BP neural network, Classification and prediction
PDF Full Text Request
Related items