Font Size: a A A

Prediction And Application Of DNA Modification Sites Based On Deep Learning

Posted on:2022-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2480306770491074Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
DNA modifications contain rich biological information,which can regulate gene expression and participate in many cellular activities.There are many different DNA modifications in cells,and DNA N4-methylcytosine(4m C)and DNA N6-methyladenine(6m A)are the two most common DNA modification sites.4m C plays an important role in genome stability,recombination and evolution.6m A has important regulatory functions in biological genomes.With the deepening of research,traditional experimental methods and machine learning methods cannot meet the needs of DNA modification site prediction,and finding more efficient prediction methods has become a research hotspots of DNA modification site prediction.The application of deep learning method is helpful to improve the prediction efficiency of modified sites.This paper mainly studies the application of deep learning methods in the prediction of DNA modification sites.The research contents are as follows:1.A method for predicting DNA N4-methylcytosine sites based on gated recurrent unit and deep neural networks is proposed,which is called i4 m CGD.The DNA sequence is first encoded using trinucleotide composition,electron-ioninteraction pseudo potentials of nucleotides,and position-specific trinucleotide propensity.Second,the three vectors are fused,and the fused feature set is optimized by the mutual information method.Finally,the optimal feature subset is input into the deep learning framework GRU?D composed of gated recurrent unit and deep neural networks.Models are evaluated using a 10-fold cross-validation method.The prediction accuracy of i4 m CGD on the six datasets reaches 92.4%,91.9%,88.6%,93.7%,95.3% and 97.3%,respectively.Compared with other advanced methods,i4 m CGD has better prediction performance.2.A method for predicting DNA modification sites based on bidirectional gated recurrent unit and convolutional neural networks is proposed,which is called4 m Ci6m A-BGC.First,based on sequence information and physicochemical properties,DNA sequence information is extracted using binary,K-mer nucleotide frequency,pseudo K-tuple nucleotide composition,dinucleotide-based auto covariance and mono Di KGap theoretical description.Secondly,five kinds of feature vectors are fused to obtain a high-dimensional feature space.Thirdly,the elastic net is used for feature selection,and the optimized feature subset is obtained.Finally,the optimal feature subset is fed into a deep learning framework consisting of bidirectional gated recurrent unit and convolutional neural networks.The results of10-fold cross-validation show that the prediction accuracy on the benchmark dataset is significantly better than the existing prediction methods,and the prediction accuracy rates reaches 97.1%,95.9%,96.1%,98.7%,97.7% and 99.6%,respectively.Meanwhile,the predictive ability of 4m Ci6 m A-BGC is further validated using independent datasets Rice and Arabidopsis thaliana.The comparison shows that compared with the existing prediction methods,4m Ci6 m A-BGC has the best prediction performance,and the prediction accuracy reaches 98.2% and 89.0%,respectively.The results indicate that 4m Ci6 m A-BGC is an effective method to identify DNA modification sites.
Keywords/Search Tags:DNA N4-methylcytosine, DNA N6-methyladenine, gated recurrent unit, deep neural network, bidirectional gated recurrent unit, convolutional neural network
PDF Full Text Request
Related items