Virtual screening of drugs aims to screen biologically active small molecules from a large number of compounds by computer technology in drug discovery,so as to improve the discovery efficiency of lead compounds substantially.Ligand molecules naturally constitute a graph structure,whose nodes and edges represent their atoms and bonds,respectively.Consequently,molecular graph neural networks can be utilized to form an end-to-end multiple-level representation learning.Drug development for new or potential drug targets is a research hotspot,but the success of virtual screening models usually depends on a large number of training samples.When the ligand samples with known bioactivities are insufficient,ligand-based virtual screening is difficult to obtain good prediction performance.Transfer learning is suitable to solve the problem of insufficient samples by introducing abundant information from the source domains in ligand-based virtual screening.Therefore,this paper proposed a novel method,TL-MGNN,using transfer learning with molecular graph neural networks for precise modelling and representation of bioactivities of ligands targeting GPCR proteins without sufficient data.The pipeline of TL-MGNN consists of two steps:(i)pre-train the molecular graph neural network models on the source domain datasets with sufficient samples;(ii)fine-tune the models on the target domain datasets and return the prediction results.TL-MGNN was tested on a series of representative GPCR proteins of 54 target domain datasets covering most human subfamilies.The experimental results showed that TL-MGNN achieved the best performance on most datasets,significantly better than the other GNN-based methods such as WDL-RF.WDL-RF is a deep learning-based method that needs a large number of samples to train the model.If the training samples are insufficient,the model is typically hard to achieve convergence.Similarly,the attentive FP,GIN,Weave and MPNN methods also need enough ligand samples to train the models.TL-MGNN utilized knowledge information from the source domains through transfer learning to help train the models,so it obtained better performance.In addition,we compared TL-MGNN with the WDL-RF method combined with transfer learning(TL-WDL-RF).TL-MGNN obtained an average improvement of 11.96%on r~2and 3.69%on RMSE.TL-MGNN not only considered the feature information of the atoms,but also added the bond information,so it can further improve the model performance.This paper also determined the effect of the size of training samples from target domains on the TL-MGNN model performance.The results indicated that the training samples from the target domains can significantly affect the TL-MGNN model performance,and the TL-MGNN method is most likely to achieve the best improvement on the case with few training samples.The effect of size of training samples from source domains on model performance were aslo examined.The results indicated that the model performance had been improved on most datasets with the increase of the number of training samples from source domains.In order to make users better use the virtual screening methods of drugs proposed by our team,we developed several virtual screening platforms.One was developed based on a GCN-based method TL-MGCN,and the other two were developed based on two novel multi-task learning methods MTR-GL and MTR-ISLR.The platforms were built based on Apache+My SQL+PHP under Windows system.The main functions of these platforms are to predict the bioactivity values of ligands interacting with GPCRs and generate the corresponding molecular fingerprints.This paper used HTML and CSS to design the platforms’pages,implemented the core virtual screening algorithms based on the Python programs,accomplished the interactions between the foregrounds and the backgrounds through the PHP programs,and finally returned the prediction results to users.Additionally,the security design of the platforms was considered,mainly including the restriction of repeated operations,the test of inputs and the prevention of XSS.Finally,we opened all the data and source codes freely.At present,many virtual screening platforms and softwares have been developed,but in fact,the number of completely free tools is very limited,thus these virtual screening platforms have certain significance to drug discovery. |