| Proteins are complex organic compounds in living organisms that provide the material basis for life activities.The main way for proteins to function is by binding with other proteins and producing protein complexes.Protein complexes play an important role in key cellular activities,so studying the structure of protein complexes is of great significance.However,determining the structure of protein complexes through biochemical experiments can incur significant costs,making it more practical to obtain protein complex structures through computational simulation of protein docking.Usually,simulating protein docking involves two steps:(1)generating a large number of protein docking models through spatial search.(2)Use the protein docking model activity prediction method to score and rank the generated protein docking models,and select the top ranked model as the final result of the near natural protein complex structure.The screened protein docking models can provide a structural basis for drug and vaccine development.However,in the current computer simulation of protein docking processes,the second step in predicting the activity of protein docking models is still the bottleneck for accurate protein docking.In response to this issue,this article utilizes PointNet,a deep learning method based on point clouds,to predict the activity of protein docking models.This method directly learns the deep expression of geometric attributes and atomic features from the 3D structure of proteins,thereby establishing a neural network model based on deep learning.The model construction process is divided into four steps:(1)Protein complex 3D structure preprocessing: Firstly,preprocess the PDB file of the protein docking model 3D structure,including deleting empty and non atomic information in the PDB file,recoding atomic numbers,recoding residue numbers,and specifying the chain name of the protein complex.Secondly,center align the protein docking model to extract the most docking interface between all residues within 10.0(?) between two proteins in the protein docking model and any combination of residues from other subunits.(2)Transform the docking interface into point cloud format: Calculate the geometric characteristics and physicochemical information of amino acid heavy atoms on the protein docking model interface.It includes the three-dimensional coordinates of atoms,Van der Waals radius,atomic weight,atomic type,atomic valence,atomic charge and atomic chain.Construct point cloud data by using the heavy atoms of residues on the docking interface of the protein docking model as points,and the geometric and physicochemical information of the points as features.(3)Model construction: This article builds a deep learning model for activity prediction based on PointNet.The activity prediction model includes three modules:Encoding Layer,Maximun Pooling Layer,and Fully Connected Layer.The deep learning model uses the point cloud data of the protein docking model as input for activity prediction.(4)Model evaluation: The model was trained using a five fold cross validation method on the ZDOCK Benchmark 4.0 dataset,and compared with existing representative activity prediction methods based on the evaluation indicators SR(Success Rates)and HC(Hit Counts)values.Further comparisons were conducted on independent datasets ZDOCK Benchmark 5.5 and Dock Ground1.0 to verify the effectiveness of the proposed model.In summary,this article first preprocesses the protein 3D,then converts the point cloud data format of the docking model.Then,a deep learning model based on PointNet is proposed for activity prediction of the protein docking model,and the model is trained using a 50 fold cross validation dataset.The proposed model achieves excellent SR and HC values,and experimental results show that its performance is superior to other comparative algorithms.In order to further verify the effectiveness of the model,this article also tested the effectiveness of our model and comparison method on independent datasets ZDOCK Benchmark 5.5 and Dock Ground1.0.The results showed that the SR and HC of the top 22 algorithms on ZDOCK Benchmark 5.5 were higher than those of other algorithms,and the SR and HC of our model were higher than those of other algorithms on Dock Ground1.0.The experimental results on independent datasets also validate the effectiveness and generalizability of the proposed model.Therefore,this model has better predictive performance in predicting the activity of protein docking models,and can provide a material basis for drug research and vaccine development. |