Font Size: a A A

Computational Prediction Of RNA-Protein Interactions

Posted on:2019-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F ZhengFull Text:PDF
GTID:1360330596959571Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
Protein-RNA interactions play an important role in life processes,and are involved in post-transcriptional gene regulation.If the regulation is abnormal,it will lead to various diseases.Due to its important significance,many experimental techniques have been developed to study protein-RNA interactions.Computational methods are developed to predict protein-RNA interactions because of the limitations of the experimental techniques themselves(such as mRNA capture technology can only extract RBPs that interact with mRNA)or the methods that require significant time and money.In this paper,we study three aspects of the construction of a three-dimensional model of protein-RNA complexes,binding site prediction on proteins and RNA,and prediction of RNA-binding proteins.In modeling the three-dimensional structure of protein-RNA complexes,we use structural and sequence alignment approaches to align complex structures.The similarity of complexes is described with the minimum value of RNA and protein similarity.We compared the ability of different algorithms to detect protein-RNA three-dimensional complex templates.The results of a comparison of all-to-all alignment of protein-RNA complexes indicated that there is a transition point which describes the interaction mode from dissimilar to similar.Moreover,we found that more templates could be detected by the structural alignment approach than the sequence alignment method.Based on the structural alignment method,we developed PRIME to construct a three-dimensional model of protein-RNA interactions(protein alignment with TMalign,RNA alignment with SARA).Tested on the protein-RNA docking benchmark dataset,the success rate of PRIME was 15% higher than the free docking method 3dRPC(for top 10 model).In addition,PRIME runs three times faster than 3dRPC.During the process of developing PRIME,we found that SARAscore describing the structural similarity of RNA is a size-dependent score.This caused PRIME to be unable to detect some templates,so we developed an RMscore that does not rely on RNA length to describe the structural similarity of RNA.Based on RMscore,we developed the RMalign program for RNA three-dimensional structure alignment.In the test,we found that RMscore is more capable of describing the similarity of RNA structure than SARAScore.Also,RMalign is as good as the state-of-the-art RNA structural alignment method ESA-RNA.Therefore,we used RMalign to instead of the original RNA structure alignment method in PRIME and updated RRIME to version 2.0.Tested on the protein-RNA docking benchmarking set,PRIME2.0 had a 10% higher success rate than the PRIME(Top 1 prediction).When PRIME was tested on a genome scale,we found that many RNAs have no three-dimensional structures.Therefore,the three-dimensional structures of protein-RNA interactions cannot be constructed.To extend the template to the genome scale,we propose a 3D2 D model to represent protein-RNA interactions.The 3D2 D model uses the secondary structure of RNA to represent RNA and the three-dimensional structure of proteins to represent proteins.Therefore,a 3D2 DPRIME method for constructing three-dimensional structures was developed.And 3D2D-score is used to represent the similarity between different protein-RNA interaction pairs.After testing,the success rate of 3D2 DPRIME was found to be only 0.02 lower than PRIME.This shows that the 3D2 D model and the 3D2D-score can also be used to describe the similarity between protein-RNA complexes.Therefore,we used the 3D2 D model to predict the binding sites in PDB scale and the highest MCC obtained was 0.70.Later extended to the genome scale,it was found that the 3D2 D model can predict the binding site,but the 3D2D-score does not select the models well.Finally,we developed an RNA binding protein predicting method,Deep-RBPPred,based on the deep learning algorithm.In Deep-RBPPred,we used some of the characteristics of the amino acids described in RBPPred: including amino acid hydrophobicity,polarity,normalized van der Waals volumes,polarizability and side-chain charge and polarity.Based on tensorFlow,an 11-layer convolutional neural network structure is constructed to train the model.In order to compare the performance of different learning algorithms,we also built an SVM model.After testing,we got the following conclusion.The performance of deep learning model is better than SVM model,and the performance of unbalanced model is the same as that of unbalance model within the same machine learning method.In comparing of different methods,Deep-RBPPred(MCC = 0.78)and RBPPred(MCC = 0.76)have a similar performance.The MCC of Deep-RBPPred is 0.02 higher than the RBPPred and 0.36 higher than the RNAPred.
Keywords/Search Tags:alignment, RMscore, RMalign, template, deep learning, RNA binding protein, binding sites, 3D2D model, docking
PDF Full Text Request
Related items