Font Size: a A A

Study On The Compound-Protein Interaction Based On Graph Neural Network

Posted on:2023-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Z WanFull Text:PDF
GTID:1524306809473514Subject:Drug design
Abstract/Summary:PDF Full Text Request
In recent years,applications of deep learning in drug discovery have been developed rapidly,which have greatly shortened the developing time.However,traditional deep learning algorithms are performed on the Euclidean data,while the biological big data in the field of drug development often have complex and highdimensional non-Euclidean structures,such as the three-dimensional structures of molecules and the relational graphs characterizing the interconnections between the biological data.In order to adapt to deep learning algorithms,these non-Euclidean data are often transformed into Euclidean structures,which can cause the loss of efficient information and increase the difficulty of model interpretability analysis.Graph neural networks migrate traditional deep learning algorithms to non-Euclidean data,which can fully characterize the biological data,thus advancing the further development of deep learning in drug discovery.Compound-protein interaction is the most fundamental and most important area in drug discovery,and plays an essential role in virtual screening,prediction of drug toxicity and side effects,drug repurposing,etc.In this area,the geometric structures of compounds,proteins and their binding complexes,as well as the various interaction associations involved can be characterized by non-Euclidean data such as molecular graphs,relational graphs and point clouds.And graph neural networks can provide an intuitive view of these non-Euclidean data and therefore has great potential for compound-protein interaction studies.Based on this background,this dissertation concentrates on the study of compound-protein interaction based on graph neural networks.In Chapter 1,an overview of a variety of deep learning models based on nonEuclidean data,including graph neural networks,as well as their applications in compound-protein interaction studies is provided.This is followed by our researches on this topic,which contain two parts.The first part(Chapter 2)utilized inductive graph neural networks for predicting compound-protein interactions,where the compoundprotein heterogeneous relational graph was represented in a homogeneous way,and then inductive graph aggregators were employed to accommodate cold-start problems.The second part(Chapter 3)utilized deep learning on poind clouds for predicting compound-protein binding affinities,where the structures of binding complexes were represented as point clouds,and then a dynamic graph convolutional neural network was employed to learn the representations of the point clouds.With the accumulation of massive biological data and the rise of graph neural networks,computational models based on relational graphs have achieved high accuracy in the field of compound-protein interaction prediction.However,most of the existing models use heterogeneous relational graphs as a form of data representation,and the heterogeneity of the attributes corresponding to the nodes and edges of the graph is not favourable to the message passing and aggregation on the graph.Furthermore,most of these models are transductive and cannot be applied for new data outside the training set.To address these two problems,in Chapter 2,we constructed a model named CPI-IGAE for compound-protein interaction prediction based on a weighted homogeneous graph and an inductive graph neural network.The relational graph was represented as a homogeneous graph by using ligand-based protein representations,which can help to better conduct the message exchange in graph,while the inductive graph neural network utilized for graph representation learning empowers the model to handle cold-start problems.CPI-IGAE learns the weighted homogeneous graph in an end-to-end manner,and model comparisons show that its predictive ability outperforms that of many published methods.The ablation study proves the effectiveness of the weighted homogeneous graph,and the visualization analysis as well as the cold start simulation proves the effectiveness of the inductive graph neural network.Furthermore,some of the novel compound-protein interactions predicted by our CPI-IGAE are verified by literature.In this work,we constructed a graph neural network-based tool for predicting compound-protein interactions,which can provide reliable reference for pharmaceutical researchers,and provide a novel perspective for relational graph-based studies.Compound-protein binding affinity prediction is a more demanding task than interaction prediction,and is performed as a regression task using experimentally determined activity values as labels.The three-dimensional structure of compoundprotein complex is important information for binding affinity prediction,which can be characterized by point clouds to fully preserve the original geometric information,but the existing works employing deep learning on point clouds ignore the connections between points.In order to solve this problem,in Chapter 3,after characterizing the complex structures with point clouds,we utilized a dynamic graph convolutional neural network DGCNN which considers the local associations of point clouds for processing.DGCNN was further combined with a machine learning model XGBoost to enhance the predictive power,and the performance of this model exceeds that of several published methods.The ablation study demonstrates the effectiveness of the point cloud data structure,and the visualization analysis of the features extracted by DGCNN reveals the advantages of the model architecture and its role in feature extraction.Furthermore,the visualization analysis of the dynamic graphs generated by DGCNN during inference demonstrates that the model can extract information about local geometric information at multiple scales,as well as potentially discover interaction patterns outside of human cognitive experience.Finally,we performed uncertainty estimation to characterize the credibility of the predictions quantitatively.Overall,this dissertation demonstrates the enormous potential of graph neural networks in the field of compound-protein interaction studies.Graph neural networks has strong ability of representation and information extraction for non-Euclidean data,which can enhance the creativity of human experts in drug design.
Keywords/Search Tags:Graph neural network, Compound-protein interaction, Inductive learning, Dynamic graph convolutional networks on point clouds, Compound-protein binding affinity prediction
PDF Full Text Request
Related items