Font Size: a A A

The Study Of MiRNA Prediction Based On Matrix Completion And Active Learning

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:L G SunFull Text:PDF
GTID:2480306533972469Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
MicroRNA was regarded as useless RNA for not being capable of encoding protein.Recent researches have unveiled the mystery of miRNA.Scientists discovered that miRNA regulates about one third of the human genes,and increasing experiments have also confirmed that miRNA play a key role in the development of diseases.Thus,identifying the miRNA-disease association is of great significance for the prevention,diagnosis and even treatment of diseases.In this paper,we put forward two different models to predict miRNA-disease associations,namely miRNA-disease association prediction through neighborhood constraint matrix completion,NCMCMDA and miRNA-disease association prediction through random forest and active learning,RFALMDA.In our first model NCMCMDA,we additionally introduce neighborhood constraint with matrix completion algorithm,which provides a novel approach of utilizing similarity information to assist the prediction.In the validation section,NCMCMDA shows decent prediction accuracy with AUC of 0.9086 and 0.8453 in global Leave-One-Out Cross Validation(LOOCV)and local LOOCV,respectively,and mean AUC of 0.8943 and standard deviation of 0.0015 in 5-fold cross validation.In order to test NCMCMDA's sensitivity to inaccurate data,we conduct the same three kinds of cross validation after randomly removing 20% and 30% nodes in the association network and repeat 10 times.It turns out that NCMCMDA can achieve AUC of 0.8934 and 0.8893 in global LOOCV,0.8315 and 0.8315 in local LOOCV,0.8785+/-0.0026 and 0.8753+/-0.0029 in 5-fold cross validation,respectively.In case study,we select colon neoplasm,esophageal neoplasms,breast neoplasms as targeted diseases.As a result,42,49,43 and 49 of the top 50 disease associated miRNAs predicted by NCMCMDA are confirmed by databases and literatures.In our second model RFALMDA,firstly,we build trainset and reduce feature dimension by selecting discriminative features.Then,we choose negative samples under the framework of active learning which are combined with the known associations to train the random forest classifier.In the validation section,RFALMDA achieves good performance with AUC of 0.8992 in global LOOCV,0.8231 in local LOOCV,and mean AUC of 0.8984 and standard deviation of 0.0093 in 5-fold cross validation.We also conduct three kinds of case studies to evaluate RFALMDA from different perspectives.The result shows that 45,48,43 and 46 of the top 50 diseaseassociated miRNAs concerning Lymphoma,Lung Neoplasms,Prostate Neoplasms and Carcinoma Hepatocellular are confirmed by databases and literatures.
Keywords/Search Tags:microRNA, disease, association prediction, matrix completion, active learning
PDF Full Text Request
Related items