Numerous studies have shown that long non-coding RNA(lncRNA)is closely related to biological life activities,and is involved in gene expression,cellular value-added and genetic regulation,and the occurrence of many diseases.Therefore,lncRNA-disease association prediction can help people to obtain relevant biological information,understand the pathogenesis,and better diagnose and prevent diseases.Most of the currently known lncRNA-disease association pairs come from biological experimental validation,although biological experimental validation is the most authoritative method for association prediction studies,but it inevitably entails high experimental costs and consumes a lot of human resources,so based on the existing lncRNA biological information,numerous computational methods have emerged to mine the potential lncRNA-disease association,but due to the limited association data,the accuracy of many methods is limited to achieve better results,and the biggest characteristic of lncRNA-disease association data is that there are only positive samples,which is not friendly to many fully supervised models.In the case of only positive samples and very limited number of positive samples,many computational methods are affected.Based on the above problems,this paper tries to investigate two lncRNA-disease association prediction models: one is the lncRNA-disease association prediction model LDAF_GAN based on association filtering generative adversarial network,and the other is the lncRNA-disease association prediction model incorporating variational Bayesian inference on the basis of LDAF_GAN The LDAF_GAN model is composed of a generator and a discriminator,but differs from the traditional GAN by the addition of The overall LDAF_GAN model consists of a generator and a discriminator,but differs from the traditional GAN by adding a filtering operation and negative sampling.The filtering is to let the output of the generator be point multiplied with the real data before inputting to the discriminator,so that the results generated by the model only focus on the part that has been associated(i.e.,focus on the part of the association matrix that is 1),while the negative sampling is to sample some negative samples from the data with unknown association(taking out some negative samples from the part of the association matrix that is 0 as the assumed unassociated negative samples),and by adding a regular term to the loss function So that the model not only requires the generated positive samples to be close to 1,but also requires the generated negative samples to be close to 0,avoiding the model to generate all-1 results but well cheating the discriminator,so as to make the model achieve a better fitting effect.At the same time,in order to improve the generalization performance of the model and its effect on small sample datasets,the LDAF_VGAN model is obtained by incorporating Bayesian inference on the basis of LDAF_GAN,so that the parameters of the generative adversarial network are changed from a single value to a distribution,and the mean value is sampled from the distribution to select the network parameters,thus adding uncertainty to the model and improving the generalization performance of the model.In model evaluation experiments,the model LDAF_GAN achieves better prediction results compared with Bi GAN,CNNLDA,NBCLDA,TILDA,and LDAP.the AUC values of LDAF_GAN on two publicly available datasets with five-fold cross-validation are 0.976 and 0.914,respectively.in the case study,the LDAF_GAN model has a better prediction result for six lncRNAs H19,MALAT1,XIST,ZFAS1,UCA1 and ZEB1-AS1 respectively,giving top ten predictions for disease association,with lncRNA H19 and UCA1 reaching 100% association prediction in data proven by biological experiments.Likewise,the LDAF_VGAN model achieved relatively excellent results on the two publicly available datasets with a five-fold cross-validation AUC value of 0.981 and 0.898,respectively.In addition,the experimental results of LDAF_VGAN outperformed LDAF_GAN on the small sample dataset and the new dataset,from which the experimental results showed the improvement of Bayesian inference on the generalization performance of the model and the advantage on small sample datasets.The experimental results show that the two models in this study are able to mine potential lncRNA-disease associations on lncRNA data with known associations and lncRNA data without known associations but with sequences,and achieve excellent prediction results. |