Font Size: a A A

Research On Cross-species M6A Modification Site Prediction Based On Deep Learning

Posted on:2020-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2430330590962234Subject:Software engineering
Abstract/Summary:PDF Full Text Request
RNA post-transcriptional modification is the modification on post-transcriptional RNAs,which plays an important role in cell cycle.More than 150 post-transcriptional modifications have been identified,of which N6-methyladenosine(m~6A)is the common type and widely found in mammals,Saccharomyces cerevisiae,Arabidopsis and other species.N6-methyladenosine is a reversible modification that regulates the localization,transcription,splicing and stability of RNA.In addition,it is associated with diseases such as tumors and obesity.Therefore,it is meaningful to accurately identify methylation modification sites from RNA sequences,which is of great significance to basic research and drug development.Traditional methods of identifying m~6A sites are based on biochemical experiments,which are time-consuming,expensive and small-scale.In recent years,researchers have developed several machine learning-based m~6A site predictors.Nevertheless,these are applied to a single species with limited prediction accuracy.Therefore,it is necessary to design a high precision prediction model of cross-species m~6A sites.In this thesis,the prediction of m~6A sites is studied.The main work is as follows:(1)The sequence-based m~6A sites prediction problem is studied,and a novel RNA sequence feature extraction method,Enhanced Nucleic Acid Composition(ENAC),was proposed.The method utilizes the idea of a sliding window to calculate the frequency of occurrence of each nucleotide in each sliding window respectively.This method fuses the local information and global information of the sequence,and can better express the characteristics of RNA sequences surrounding the modification site.Based on ENAC feature extraction method,a random forest m~6A sites prediction model was constructed.The experimental results show that the method can effectively improve the prediction performance of N6-methyladenosine site compared with the common feature extraction methods.(2)Applying deep learning to the prediction of m6A sites,the m~6A sites prediction model based on the unidirectional Gated Recurrent Unit(UGRU)and another prediction model based on the bidirectional Gated Recurrent Unit(BGRU)were proposed.The experimental results show that the BGRU model has better prediction results on multiple species.(3)Through the method of logistic regression,the BGRU model and the random forest prediction model based on ENAC are integrated,a high precision prediction model of cross-species m~6A sites,BERMP method,was constructed.The experimental results show that in multiple species,the prediction performance of BERMP is superior to the existing single-species m~6A site prediction method.(4)For the proposed BERMP method,we have provided an online predictive service platform,whichisfreeavailableforrelevantresearchers(http://www.bioinfogo.org/bermp).
Keywords/Search Tags:Bioinformatics, Deep learning, bidirectional Gated Recurrent Unit, N6-methyladenosine, Random forest
PDF Full Text Request
Related items