The outbreak of COVID-19 in 2020 has made the world take notice of the influence of viruses.How to develop effective drugs and vaccines against new viruses is an urgent problem for all mankind.The first step of drug discovery is to find molecules which have medicinal activity against specific targets.To this regard,it is crucial to figure out the interactions between drug targets and small molecules.However,the traditional experimental methods for discovering potential drugs are labor-intensive and time-consuming.Deep learning models have flourished in recent years,and different neural network models have been proposed to promote the development of natural language processing(NLP)and computer vision(CV).At the same time,the application of deep learning in biomedical fields has also developed rapidly,and breakthroughs have been made in tasks such as predicting protein structures and drug properties.Thus,the application of deep learning models to investigate drug-target interactions is a promising research direction.This paper proposes a method for predicting drug-target binding affinity using deep learning models.This method uses the improved long short-term memory network(LSTM)and the graph neural network(GNN)to extract features of the drug-target protein sequence and the drug molecule graphs respectively.Subsequently,their embeddings are combined as the representations of the drug-target pair,which are fed into a fully connected network to predict the dug-target binding affinity.The specific work of this paper is as follows:(1)The topological structure of drug molecules contains a lot of biochemical information,such as the number of chemical bonds and electrons,which cannot be effectively extracted by traditional deep learning models such as CNN and LSTM.To tackle this issue,graph neural networks are applied to extract features from Non-Euclidean structure data.In this paper,we improve the state-of-the-art graph neural networks,such as graph convolutional neural network(GCN),graph attention network(GAT)and graph isomorphic network(GIN)to better extract features from drug molecule graphs.Moreover,we compare the results of different graph neural networks.(2)In this paper,the amino acid sequences of the target proteins are analogous to the words and sentences in natural language processing.In this regard,the long short-term memory model(LSTM)is applied to extract the amino acid sequence information of the target proteins.Firstly,several unlabeled datasets of proteins are used as a corpus for pre-training the model,so that the model can obtain potential biological information of amino acid sequences.Afterwards the pre-trained LSTM is used for extracting embeddings from the target proteins to predict the binding affinity.As a result,we provide an innovative approach for representation learning of protein sequences.(3)To address the lack of labeled datasets of drug molecules,this paper proposes pretraining strategies for graph neural networks.Different from the pre-training of LSTM,we set two types of learning tasks for graph neural networks by using the unlabeled dataset of drug molecules.Specifically,we set the semi-supervised learning task and the supervised learning task to help the graph neural networks obtain the node level information and the graph level information respectively.In this way,the generalization capability of GNNs can be improved.In the end,this paper compares the experiment results with the cutting-edge binding affinity prediction models based on the same dataset,and demonstrates the effectiveness and accuracy of the proposed model to predict the binding affinity. |