Font Size: a A A

Research On Drug-Target Interaction Intelligent Prediction Model With Integrating Multi-source Information

Posted on:2023-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ChengFull Text:PDF
GTID:2544307025462914Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Identifying drug-target interactions(DTI)is a key task in drug discovery,and it plays an important role in the fields of virtual screening and drug repurposing.Although biological experimental methods are the most reliable way to identify DTI,however,they are commonly costly and time-consuming.With the huge growth of the amount of drug-related data and target-related data as well as the rapid development of computer technology,to improve the efficiency of identifying DTI,many researchers have worked to predict potential DTI with machine learning-based methods.These methods normally treat the problem of predicting DTI as a binary classification task or a regression task,where each drug-protein pair has a label to indicate whether there is an interaction between the drug and the protein.Although many machine learning-based methods have been proposed,however,there are still some challenges.First,supervised machine learning methods require positive and negative samples for training,however,negative samples are often missing in practice,and many studies solved this problem only by selecting negative samples randomly from unknown associations.Second,how to effectively utilize drug-related and target-related multi-source information to improve the prediction performance is also a major challenge in this field.In view of this,this paper focuses on the following works:(1)Two types of hidden biases that may cause overly optimistic prediction results when negative samples are selected from unknown associations randomly were found,and experiments were performed on several prediction methods to verify the existence of these two types of hidden biases.Then,corresponding methods were proposed to avoid these two types of hidden biases.Furthermore,a method was proposed to select reliable negative samples based on the shortest path length in the drug-protein-disease heterogeneous network and the DTI dataset was constructed.The theory of the method is that in the drug-proteindisease heterogeneous network containing multiple associations,the longer the shortest paths between drugs and proteins,the less likely that they will interact with each other.In addition,the compound-protein interaction(CPI)dataset was constructed by dividing the positive and negative samples based on the binding affinity records between compounds(drug candidates)and proteins(target candidates).(2)A DTI prediction framework,HNGO-DTI,was proposed to integrate topological information in the drug-protein-disease heterogeneous network and gene ontology(GO)annotation information of proteins.First,the Pub Chem fingerprints of drugs and KSCTriad descriptors of proteins were calculated as initial features,and their low-dimensional representations were generated by fully connected(FC)layers.Then,all target associations(i.e.,DTI)were removed from the heterogeneous network and the topological features of drugs and proteins were extracted from the drug-protein-disease heterogeneous network with the heterogeneous graph neural networks(HGNNs).In addition,GO annotation features of proteins were extracted with graph neural networks(GNNs)in the GO term similarity networks and GO term-protein bipartite networks.Finally,deep neural networks(DNNs)were used to predict the potential DTI.To validate the effectiveness of the method,HNGDTI was compared with several advanced prediction methods,and the prediction results are higher than other methods.(3)A DTI prediction framework,DFDTI,was proposed to fuse different types of drug structure descriptors and protein structure descriptors.First,multiple structural descriptors of drugs and proteins were calculated as initial features,and their corresponding lowdimensional representations were generated by FC layers.Then,considering that different types of descriptors contribute differently to the DTI prediction,the weights of the descriptors were learned automatically with the channel attention mechanism.In addition,one layer Transformer encoder was used to enhance the feature representation of descriptors.Finally,the potential DTI were predicted by DNN.The experimental results show that the method has good prediction performance.(4)A DTI prediction framework based on ensemble deep learning,EDDTI,was proposed.First,the multi-source similarities of drugs and proteins were calculated as initial features.Then,all combinations of single drug feature and single protein feature were fed into different DNNs to train multiple base learners individually.Finally,the prediction results of all base learners were combined as the final prediction results.In addition,to demonstrate the extensibility and superiority of the method,a variant prediction framework of EDDTI,EDDTI-d,was proposed with drug descriptors and protein descriptors as initial features.The experimental results show that the method has better prediction performance compared with several advanced prediction methods.In summary,this paper studies how to construct a reliable DTI dataset and constructs three DTI prediction models based on deep learning,and all of them achieve satisfactory prediction performance.The purpose of this paper is to provide assistance for identifying DTIs,thus improving the efficiency of drug discovery.
Keywords/Search Tags:drug-target interactions, deep learning, negative sample selection, heterogeneous network, multi-source feature fusion
PDF Full Text Request
Related items