Font Size: a A A

Prediction Of RNA-protein Interactions Based On Machine Learning

Posted on:2022-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2480306770991049Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
RNA-protein interactions(RPI)play crucial roles in foundational cellular physiological processes such as cell motility,chromosome replication,transcription and translation,and signal transduction.and other causes of diseases.Predicting RPI can provide guidelines for cell biological function exploration,disease intervention and drug design,which is of great significance to the development of life science,medicine and information science.Traditional biological experimental methods are timeconsuming and labor-intensive and have poor stability,which cannot meet the needs of large-scale prediction tasks.Therefore,there is an urgent need to explore and develop low-cost and high-efficiency computational methods for RPI prediction research.This paper comprehensively uses a variety of machine learning methods to predict RPI.The research contents are as follows:1.The RPI prediction method RPI-MDLStack based on Stacking ensemble machine learning is proposed.Firstly,the sequence information,physicochemical properties and secondary structure information of RNA were extracted by fusing the four methods,and the sequence information,physicochemical properties and evolutionary information in proteins were extracted by fusing the four methods.Secondly,this paper applies the least absolute shrinkage and selection operator(LASSO)feature selection algorithm to the field of RPI prediction for the first time to retain the optimal feature subset.Then,based on the Stacking integration strategy,three machine learning methods,multilayer perceptron(MLP),support vector machine(SVM),random forest(RF),and gated recurrent unit(GRU),deep neural networks(DNN)two deep learning methods are combined as base classifiers and integrated with metaclassifier SVM to build a prediction model framework.Finally,based on five-fold cross-validation,the accuracy rates of RPI-MDLStack on RPI488,RPI369,RPI2241,RPI1807,and RPI1446 reached 96.7%,87.3%,94.6%,97.1%,and 89.5%,respectively.The overall prediction accuracy reached 97.8% in independent validation test experiments.In addition,this paper also uses the RPI-MDLStack model to predict and visualize the RPI network.Experimental results show that RPI-MDLStack can provide insights for improving RPI prediction ability.2.A method named RPI-Capsule GAN is put forward to predicting RPI based on the generative adversarial capsule network and the convolutional block attention module.Firstly,k-mer,RNA secondary structure,mono Mono KGap and mono Di KGap,Pse SSC were fused to characterize RNA sequence information.CT,protein secondary structure,evolutionary difference position-specific score matrix,reduced positionspecific score matrix,GTPC were fused to characterize protein sequence information.Secondly,the elastic net(EN)was adopted as the feature selection method for geting the redundant information or noise irrelevant to the task out.Finally,in this study,the convolutional block attention module(CBAM)is introduced into the generative adversarial capsule network(Capsule GAN)for the first time to construct an RPI predictive framework,and each input feature is assigned weight to optimize the feature space.In the 5-fold cross validation test,the prediction accuracy of RPI-Capsule GAN method for RP1488,RPI369,RPI2241,RP11807 and RPI1446 is 97.1%,88.8%,92.5%,97.3%,and 87.8%,respectively.In the five test datasets of NPInter227 constructed in this paper,the prediction accuracy reaches 97.38%,96.48%,97.38%,97.81%,97.15%respectively,which is superior to other comparative classification algorithms.In addition,RPI-Capsule GAN also achieved good results in the prediction of independent test datasets,and the RPI network from Saccharomyces cerevisiae dataset was predicted and analyzed.Extensive experiments show that RPI-Capsule GAN can accurately predict RPI networks.
Keywords/Search Tags:RNA-protein interactions, multi-information fusion, least absolute shrinkage and selection operator, elastic net feature selection, Stacking ensemble strategy, convolutional block attention module, generative adversarial capsule network
PDF Full Text Request
Related items