Font Size: a A A

A Position-specific Encoding Algorithm Of Nucleotide Sequences For Detecting Enhancers

Posted on:2022-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:X C MuFull Text:PDF
GTID:2480306329989609Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Enhancers are short genomic regions that typically exert tissue-specific regulation of remote coding regions.Enhancers can be observed in both prokaryotic and eukaryotic genomes.Accurate identification of Enhancer fragments contributes to a better understanding of transcriptional regulation mechanisms.In this paper,the position information of each k-mer(Seq Pose)was introduced into the DNA sequence encoding strategy,and an enhancer classifier was constructed by combining the bidirectional long and short memory neural network and the attention mechanism.The first layer of the proposed classifier is used to identify enhancers and non-enhancers,and the second layer is used to evaluate the transcriptional regulation intensity of detected enhancers.The dierence is that the existing studies focused on the statistical characteristics of the bases on the DNA sequence,while this study did not take advantage of the statistical characteristics of the DNA sequence.In this paper,we assume that dierent DNA feature vectors may have dierent eects on the classification model,so we introduce an attention mechanism to assign dierent weights to these feature vectors.In this paper,the proposed model is compared with the existing model on the same data set.The leave-one method verification on the training data set shows that the proposed sp Enhancer has similar classification performance compared with the three existing classification models.For the dichotomy problem on the independent test data set,sp Enhancer achieved the best overall performance in the stability index(MCC).The experimental data show that the Seq Pose strategy is helpful to improve the accuracy of the classification model based on DNA sequence feature vector.The proposed sp Enhancer can be well used as a complementary model for existing studies,especially for unknown Enhancers not included in the training dataset.
Keywords/Search Tags:Two-tier classification model, SeqPose features, Feature selection, Word vector model, Attention mechanism
PDF Full Text Request
Related items