Font Size: a A A

Deep Learning-based RNA Structure Feature And Function Prediction

Posted on:2022-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:S S SunFull Text:PDF
GTID:1480306518498474Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
In recent years,due to the relatively slow growth of the three-dimensional structure data of RNA,people have not been able to further understand the biological function mechanism of RNA,and it has also affected the development of small molecule drugs targeting RNA.Therefore,there is an urgent need to develop excellent computational methods to predict the three-dimensional structure of RNA.But at present,the three-dimensional structure prediction of RNA is still a huge challenge.Therefore,in view of the research in the field of protein three-dimensional structure prediction,this thesis first uses the existing three-dimensional structure data to focus on the three-dimensional structural features and function prediction of RNA.This thesis mainly develops two RNA structural feature prediction algorithms and one RNA function prediction algorithm.For the prediction of RNA structural features,based on deep learning algorithms,this thesis develops a new RNA solvent accessible area prediction method named RNAsol and a new RNA inter-nucleotide contact map prediction method named RNAcontact.For RNA function prediction,this thesis has developed a new RNA small molecule ligand prediction algorithm named RNALigands.First of all,RNAsol is a deep learning prediction model based on long and shortterm memory neural networks.This method uses an improved multiple sequence alignment to construct a position-specific site matrix of RNA,and adds the background frequency of corresponding nucleotides to characterize RNA.Tests showed that RNAsol has precisions of 0.43 and 0.26 on the protein-bound RNA and non-protein-bound RNA data sets,which is higher than the other previous method26.5% and 136.4%.When the training set was expanded to include two types of RNA,the precisions of protein-bound RNA and non-protein-bound RNA increased to 0.49 and 0.46,respectively.The improvement of RNAsol's accuracy is mainly attributed to two aspects,including an improved RNA position-specific site matrix and the construction and optimization of long and short-term memory neural networks.Secondly,RNAcontact is a deep learning prediction model based on residual neural network.This method is the first to construct the covariance feature of RNA based on multiple sequence alignments,and then merge the RNA secondary structure with it as the input feature of the neural network.Experiments show that on the independent test set,the prediction accuracy of RNAcontact in top L/10 and top L(where L is the length of RNA)reached 0.8 and 0.6,respectively,which is much higher than other methods based on co-evolution.Analysis shows that about 1/3 of the correctly predicted nucleotide contacts are not base pairing in the sense of secondary structure,which is essential for determining the three-dimensional structure of RNA.In addition,this thesis demonstrates that predicted nucleotide contacts can be used as distance constraints to guide RNA three-dimensional structure modeling.Tests have shown that using predicted nucleotide contacts can build a more accurate model of the three-dimensional RNA structure compared to models that do not use nucleotide contacts.Lastly,the RNA small molecule ligand prediction algorithm RNALigands is a database search algorithm.First of all,this method established the first RNA motif and small molecule ligand interaction database.Then,this thesis proposes a motif extraction algorithm and a motif comparison algorithm.Based on these,this thesis developed a search and prediction algorithm for RNA small molecule ligands.The prediction algorithm is based on the input RNA sequence,predicts its secondary structure and extracts the motif,and then searches for similar motifs in the database through the motif alignment algorithm,and then obtains the potential small molecule ligands of the input RNA.This thesis demonstrated the utility of the algorithm by querying ?-synuclein m RNA 5'UTR sequence and finding potential matches.
Keywords/Search Tags:deep learning, RNA solvent accessible aurface area, RNA internucleotide contact map, RNA-small molecule interactions
PDF Full Text Request
Related items