Font Size: a A A

Research On The Prediction Model Of Protein-RNA Binding Sites Based On The Two-dimensional Fusion Of Graph Convolution Neural Network

Posted on:2024-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:J M ZhangFull Text:PDF
GTID:2530307064485814Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Unraveling the interaction mechanism between protein and RNA is the basis for understanding various biological activities and designing new drugs.It is crucial to fully characterize the determinants that affect the binding specificity between proteins and RNAs at the molecular level.With the rapid development of sequencing technology,the data of the interaction between protein and RNA is increasing,which provides the possibility to predict the binding sites of proteins and RNAs on a large scale by using computational methods.Prediction and analysis of binding sites between proteins and RNAs can not only deeply understand the mechanism of action between proteins and RNAs,but also extends the predictive model to more biological processes.There have been many studies in computational biology dedicated to predicting protein-RNA binding sites,including prediction methods based on traditional machine learning and those based on deep learning.Traditional machine learning methods require manual design of data features and rely on domain knowledge,making their implementation difficult.Although deep learning methods have advantages in feature extraction and learning ability,research using this method mainly focuses on predicting the interaction between sequence pairs or predicting binding sites in single sequence fragments,and cannot predict whether specific amino acid-nucleotide pairs in proteins and RNA bind together.At the same time,existing research methods mainly extract protein and RNA features at the sequence level,but seldom consider their spatial structure characteristicsBased on the problems existing in the existing research,this paper proposes a two-dimensional fusion protein-RNA binding site prediction model that includes both RNA and protein sequences and spatial structure features,starting from the data source and network model.In terms of data sources,this paper collects and sorts out the data of RNA and protein-binding macromolecules in the PDB database,extracts the sequence features of RNA and protein,and constructs the secondary structure adjacency matrix features and spatial structure features of RNA and protein,and uses both the sequence and structure features of protein and RNA as the data source for this prediction model.In terms of network model,this paper proposes a two-dimensional fusion deep neural network model,Dstru GCN,which consists of a graph convolutional neural network(GCN)and a long short-term memory network(LSTM),to train the sequence and structure information of protein and RNA,and combines the training features of both to predict the protein-RNA binding site matrix.Through ten-fold cross-validation,the average AUC value of the proposed prediction model in this paper on the independent test set is 0.965,and the effectiveness of the graph convolutional neural network in the prediction of protein-RNA binding sites is verified by comparative experiments.The proposal and use of the Dstru GCN model can effectively predict the binding site of protein and RNA.The characteristic of this neural network model is that it combines the sequence and spatial structure characteristics of RNA and protein,and learns specific amino acids-nucleosides through a complex training process.The binding rules between acids provide new ideas and methods for in-depth research on the interaction between RNA and proteins.
Keywords/Search Tags:Protein-RNA Binding Sites, Graph Convolutional Neural Network, Long Short-Term Memory Network, Sequence Information, Structural Information
PDF Full Text Request
Related items