Font Size: a A A

Predicting Non-coding RNA-protein Interactions By Machine Learning

Posted on:2020-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:S P ChengFull Text:PDF
GTID:2480306215966089Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Noncoding RNA(ncRNA)play an important part in a number of cellular processes,such as RNA modification processing,virus replication,human disease and so on,ncRNA function by combining with protein usually,so the key to research ncRNA is to identify noncoding RNA-protein interaction(ncRPI).Now experimental methods which predict ncRPI are expensive and timeconsuming,so we developed a computational method in the work,extracting feature basing on the sequence of ncRNA and protein,then to preprocess raw features by training a Convolutional Autoencoder,which inclueded four convolutional layers.it can descend dimension of the raw features,and mine hidden information from raw data,so that it can improve the predictive accuracy.Three kinds of machine learning algorithms(Random Forest(RF),Extreme Gradient Boosting(XGBoost)and Light GBM)are trained to identify the interaction of ncRNA and protein in the classification stage by grid searching,the results show that all the three methods achieve high performance with the accuracy of 0.791(RF),0.791(XGBoost),0.757(Light GBM)on RPI369,respectively.On RPI488 three models obtain the accuracy of 0.908(RF),0.918(XGBoost),0.918(Light GBM),respectively.The three models obtained higher AUC on large-scale data,On RPI1807 all the three models obtain the AUC of 0.99,the three models achieve the smallest AUC of 0.87 and 0.81 on RPI2241 and RPI13254,respectively.they performed well for predicting ncRPI.The research results show that our methods can predict ncRNA and protein interaction or not,it can be used to research ncRPI.
Keywords/Search Tags:interaction of ncRNA-protein, Convolutional Autoencoder, Random Forest, Extreme Gradient Boosting, LightGBM
PDF Full Text Request
Related items