Experimental drug development is costly,complex,and time-consuming,and the number of drugs that have been proven to be put into application treatment is small.Recent studies have shown that compared with traditional drug development,drug repurposing has the characteristics of low cost,short time,and low risk,and has received widespread attention.Drug repurposing is usually modeled as a recommendation system problem.The identification of drug-disease correlation can provide important information for drug discovery and drug repurposing.At present,the main computational drug-disease association prediction algorithms are mainly divided into three categories.One is the method based on network analysis.The known drug and disease information is usually constructed into a heterogeneous network graph,and the topological properties and node information of graphs is used to predict drug-disease associations;one is based on machine learning methods,which use commonly used models in machine learning to model and predict new associations between drugs and diseases;the last category is methods based on matrix factorization or matrix completion.Based on the assumption that the underlying factors that determine the drug-disease association are highly correlated,that is,the drug-disease matrix to be completed is low-rank.Therefore,by constructing a low-rank drug-disease association matrix similar to the known drug-disease association to discover new drug-disease associations.However,these methods are run in a noise-free environment by default,and their ability to process sparse data is not good enough,that is,the anti-interference ability is weak.At the same time,the above algorithms are difficult to learn the in-depth information of complex data,and cannot fully extract the hidden information of complex data.In recent years,there have been more and more large databases used for biological data research such as drugs and diseases.Therefore,gradually try to apply deep neural networks to biological data development.The advantage of a deep neural network is that it can extract very effective features from large-scale data,and can learn the complex relationship between original input features and output decisions.The deep generative model is one of the most promising methods for unsupervised learning,and one of the mainstream models is the variational autoencoder(VAE).The variational autoencoder model learns the distribution of the data rather than the unique feature representation of the data,so it can handle the noise and missing in the original input data well,so the algorithm can greatly reduce the impact of noise and missing data on the prediction results,and at the same time due to its strong learning ability,VAE can learn the deep-level information of complex data.Here this article proposes a drug-disease association prediction algorithm DDVAE(Predicting drug-disease associations based on variational autoencoders)based on variational autoencoders,which generates new data by learning the latent variable distribution of known data to achieve the goal of predicting drug-disease associations.First,construct the input data of the model based on the drug and disease information in the public biological database: drug feature data,disease feature data,and drug-disease related information;secondly,after performing principal component analysis on the features and reducing the dimensionality,the known drug-disease associations are used as supervision information is used as supervised information(that is,to reconstruct the known drug-disease association data)to train the improved variational autoencoder model;Finally this paper extract the hidden variable layer feature vector of the trained drug and the hidden variable layer feature vector of the trained disease,respectively to generate drug-disease association predictions based on drug characteristics and disease-drug association predictions based on disease characteristics,and then vote on the two prediction results to obtain the final drug-disease association prediction results.The prediction results are verified in various aspects.The performance of the DDVAE algorithm is analyzed.In the experiment,this paper compares the DDVAE algorithm with the BBNR,Drug Net,MBi RW and DRRS algorithms on a unified data set.The comprehensive experimental results show that compared with these prediction algorithms,the DDVAE algorithm improves the overall prediction.In addition,further analysis and verification of the predicted unknown drug-disease association also proved the practicality of the method. |