Font Size: a A A

Prediction Of Human Long Noncoding RNA-protein Interactions Based On Ensemble Strategy

Posted on:2019-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:H HuFull Text:PDF
GTID:2394330545965918Subject:Biomedical statistics
Abstract/Summary:PDF Full Text Request
Noncoding RNA(ncRNA)was once thought to be useless.Since 1990 s,more and more new ncRNA have been found,and scientists are gradually aware of the important biological functions of ncRNA.The research of noncoding RNA and protein interactions is crucial for many important cellular processes.In particular,the interaction between long noncoding RNA(lncRNA)and protein plays an important role in post transcriptional gene regulation,such as RNA splicing,translation,signal transduction,and even complex disease progression.However,experimental verification of the interaction between lncRNA and protein is time-consuming and expensive.In recent years,more and more research has applied statistical algorithms such as statistical learning and machine learning to cross-disciplines related to biological big data.In addition,more and more ncRNA related bioinformatics data have been accumulated.These reasons have prompted computational models to predict interaction between ncRNA and protein as new research methods.Although several models can be applied to predict the interaction between ncRNA and protein,they do not accurately characterize the interaction between human long noncoding RNA and protein.In this study,we propose a model called HLPI-Ensemble,which is designed to predict the interaction between human lncRNA and protein.The HLPI-Ensemble model consists of three Ensemble models: HLPI-SVM Ensemble,HLPI-RF Ensemble,and HLPI-XGB Ensemble.They are integrated by 9 sub-models of three machine learning algorithms of SVM,RF and XGB,respectively.The 9 sub-models corresponding to each machine learning algorithm is trained by 9 lncRNA-protein features combinations.The average ensemble strategy and the linear ensemble strategy are introduced to integrate the 9 sub-models of SVM,RF and XGB and generate corresponding Ensemble models(HLPI-SVM Ensemble,HLPI-RF Ensemble and HLPI-XGB Ensemble).10-fold cross validation results show that,on the test set,HLPISVM Ensemble,HLPI-RF Ensemble and HLPI-XGB Ensemble achieve AUC of 0.95,0.96 and 0.96,respectively.In order to test the performance of the HLPI-Ensemble model more strictly,we introduced the independent test set to compare the HLPIEnsemble with other prediction models.The results further indicate that the HLPIEnsemble model has a good performance in predicting the interaction of human lncRNA-protein interaction.HLPI-Ensemble is published in http://ccsipb.lnu.edu.cn/hlpiensemble/.We plan to carry out research on human non-coding RNA and protein interaction sites and their functions in the next research work,and provide theoretical guidance for long non-coding RNA related research.
Keywords/Search Tags:lncRNA, protein, interaction prediction, machine learning, ensemble strategy
PDF Full Text Request
Related items