Font Size: a A A

Recognition Of Ligand-binding Sites In Proteins Based On Deep Learning

Posted on:2022-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiuFull Text:PDF
GTID:2480306329471824Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Many biological processes in nature require the participations of proteins,which play an indispensable role within or between cells by binding with each other or binding with cells and other molecules.It is important to classify the residues that can combine with small molecules in proteins for understanding the biological function of proteins and designing target-drugs.The traditional method is difficult,expensive and timeconsuming,which leads to the emergence of the current calculation method.Current calculation methods mostly use the electrochemical properties of amino acid residues in known proteins to predict whether they are part of the binding site of small molecules.According to whether the three-dimensional coordinates of amino acids in the protein are used,they can be roughly divided into sequence-based and threedimensional structure-based.There are three categories: template comparison,docking and traditional machine learning.The prediction can be combined with the properties of both the protein and the ligand,or based only on protein information.In the latter method,the amino acid residue at the binding position is considered to exist independently of the ligand.And it's also the category discussed in this article.As neural networks are widely used as the dominant method in more and more tasks,the field of machine learning is undergoing a technological change from traditional machine learning methods to deep learning.Due to the increase in the amount of data and the increasingly powerful computing resources,the deep network structure becomes the mainstream.There have been successful cases of deep convolutional neural networks used in protein binding sites,but it is difficult to capture the long-distance dependence on the protein primary sequence,so we reduce the number of convolution layers,and combined with the gating unit.However,sequence features are only part of all.Proteins are essentially irregular spatial objects and extracting its space feature is also a problem to be solved.It has been proved a very effective method to abstract things with spatial structure into graph structure of nonEuclidean space and use graph neural network for it.We propose a Res Net,GCN and LSTM integrated method(RGLIM)for ProteinLigand binding site prediction that depends on template-free protein spatial structure information.The method is divided into three parts:10 Res Net basic blocks are stacked to extract the hierarchal features of each amino acid in the protein;two GCN layers combined with dynamic convolution are used to aggregate the neighborhood of each amino acid and obtain its spatial structure information;the results in previous two steps are spliced as the final protein embedding,and the long-distance dependencies between residues can be captured by gating loop unit to predict whether the residues are bindingsites.The method is trained and tested on a set of 6000 nonredundant proteins collected in Bio Lip.Experiments show that the improvement of the Matthews correlation coefficient(MCC)is no less than 0.02 on the condition of less calculation and faster training speed.
Keywords/Search Tags:protein, small molecule, binding-ligand, resnet, graph convolution neural network
PDF Full Text Request
Related items