| Long non-coding RNAs(lncRNAs)are a class of non-coding RNAs(nc RNAs)that are more than 200 nucleotides(nt)in length and do not encode proteins.The limitations of the current lncRNA-disease association prediction model are generally as follows:(1)data set sparsity problem: the lack of known lncRNA-disease association causes sparsity problem.(2)prediction accuracy: Constrained by the lack of known lncRNA-disease association,the prediction accuracy of many prediction models is not high.(3)the prediction of isolated lncRNA-related diseases.(4)the prediction of isolated lncrna-related diseases.(5)Negative sample problem.In view of the fact that logistic matrix factorization is very suitable for binary variables and sparsity problems,this paper proposes calculation methods for predicting lncRNA-disease association based on logistic matrix factorization algorithm.The specific work of this paper is as follows:(1)This paper proposes to use neighborhood regularized logistic matrix factorization(NRLMF)to predict lncRNA-disease association(NRLMF-LDA).In NRLMF,logistic matrix factorization was used to simulate the interaction probability of each lncRNA-disease pair for data sparsity.For prediction accuracy,similar diseases tend to be associated with lncRNAs with similar functions.NRLMF model makes full use of neighborhood information through neighborhood regularization during training and neighborhood smoothing during prediction to improve prediction accuracy.In addition,NRLMF can be used to predict isolated lncRNA/disease-related diseases/lncRNAs.(2)Since the prediction performance of NRLMF is not good enough,especially for the isolated lncRNA/disease-related disease/lncRNA problems,the dual network logistic matrix factorization and bayesian optimization model are proposed to predict the associations between lncRNA and disease(DNILMF-BO).In order to improve the prediction accuracy of the model,DNILMF-BO model inherited the original advantages of NRLMF and improved it.The improvements i nclude:(1)the addition of lncRNA and disease similarity network information in the model.(2)the extraction of the most important information in different similarity matrices through nonlinear fusion.(3)the model parameter optimization using GP-MI algorithm in bayesian optimization.The models used in this paper are semi-supervised learning models without negative samples.In terms of prediction accuracy,the performance of NRLMF model and DNILMF-BO model was evaluated based on 10-fold cross validation(10-cv).The prediction effect of the two models is better than the other four comparison models,and the AUC value of DNILMF-BO model improved based on NRLMF-LDA is 4.36% higher than that of NRLMF-LDA,and the AUPR value is 14.49% higher than that of NRLMF-LDA.In terms of isolated lncRNA and disease prediction,both models can predict isolated lncRNA/disease-related diseases /lncRNA.For DNILMF-BO,the AUC value of predicting isolated lncRNA-related diseases increased by 15.99% compared with NRLMF-LDA.The AUC value of predicting isolated disease-related lncRNA increased by 5.02% compared with NRLMF-LDA.As for the case analysis,in NRLMF-LDA,we found that all the top five lncRNAs related to non-small cell cancer and cervical cancer were confirmed,and the first four lncRNAs related to glioma were also confirmed by the case Studies on non-small cell cancer,cervical cancer and glioma.Case studies of breast cancer,lung cancer,and colon cancer have shown that DNILMF-BO is an effective method for predicting disease relationships with lncRNA. |