Drugs has greatly improved the quality of human life.The effectiveness and safety of drugs are two critical aspects in drug discovery process.The effectiveness of drugs is determined by identifying the drug-target interactions,and the safety of drugs is secured by testing its toxicity.However,it is an very expensive,time-consuming and challenging task to analyse and determine the drug-target interactions,compound-protein interactions and compound toxicity through high-throughput screening experimental methods.The drug discovery research using computational methods are high efficiency and low cost,and has been paid more and more attention.Compared with the wet-lab experiments,the computational prediction methods of drug-target interactions,compound-protein interactions,and compound toxicity can provide more accurate and safe potential candidate drug-target pairs for the subsequent biological experiment,and reduce the time and cost of biological experiments in the drug discovery process.In reality,biomedical data is high-dimensional and sparse,and the integration of multi-omics biomedical data are insufficient.It will lead to inaccurate results predicted by existing computational methods,and the predictied results fit the expectation of biological experiment less.To integrate multiple types of biomedical data and handle the sparsity of drug-target interaction pairs,a drug-target interaction prediction algorithm was proposed by constructing marginalized denoising model in heterogeneous networks.The algorithm integrated multiple similarity matrices of drug/target into the drug/target kernel matrix with non-linear kernel fusion technique,and trained the marginalized denoising model in the heterogeneous network with global relation to handle problem of data sparsity.The experimental results on several datasets indicate that compared to other existing algorithms,the proposed algorithm achieves higher values of AUC and AUPR as a whole.Since the large-scale compound-protein interactions data are more sparse and high order,the computational prediction model will over-generalize and produce prediction of less relevant drugs when it extracts only high-order feature.To solve this problem,a new hybrid model was presented by integrating the architectures of factorization machine(FM)and graph neural network(GNN)to learn both low and high-order feature of compound and protein.Learning low-order feature can find frequent co-occurrence of features,Learning high-order feature can explore implicit feature.Based on this,a large-scale compound-protein interaction prediction algorithm was designed and implemented.The experimental results on several datasets,especially on a large-scale imbalanced dateset,show that learning both low and high-order feature of compound and protein can further improve the accuracy of compound-protein interaction prediction.The proposed algorithm outperforms other existing algorithms for compound-protein interaction prediction in terms of AUC,Precise,Recall and F1-score.The western blot experiment results also show that the proposed algorithm is effective and accurate for finding candidate target proteins.The first two works of this dissertation studied the drug-target,compound-protein interactions prediction for drug effectiveness,while the third work of this dissertation studied the drug toxicity prediction for drug safety.By paying attention to important features in physicochemical properties,atoms and chemical structures of compounds,the third work of this dissertation is to propose a drug multi-toxicity prediction algorithm using hierarchical attention network and fusion multi-feature.The hierarchical attention network concerned the self important information of physicochemical properties,atoms and molecular fingerprints in molecule,also concerned the important feature fusion information between physicochemical properties and atoms of molecule,as well as the important feature fusion information between atoms and molecular fingerprints.The proposed prediction algorithm use multi-label classifier to improve the efficiency by sharing the feature weights among multi-toxicity.The experimental results on the large scale toxicity dateset Tox21 show that the proposed algorithm can achieve higher AUC value in multi-toxicity predictions than existing toxicity prediction algorithms.The work of this dissertation can promote further research for drug discovery by computational methods,and the prediction results can provide candidate data for next biopharmaceutical experimental verification. |