| The superfamily of G protein-coupled receptors(GPCRs)includes more than 800 seven-transmembrane receptors involved in a variety of physiological and pathological processes,and has become an important drug target in modern medicine,accounting for approximately 36%of clinical trials drugs targeting human GPCRs.The currently known experimental data is extremely lacking due to insufficient research on the biological activity of ligands targeting orphan GPCRs.In order to solve the problem of insufficient samples we have designed a new multi-source transfer graph neural network(Multi-source transfer learning with graph neural network,MSTL-GNN)algorithm to predict the biological activity value of the binding of ligands molecules and orphan GPCRs.MSTL-GNN algorithm is divided into four parts:(1)the replacement sampling based data set generation;(2)construction of virtual screening model based on graph neural network;(3)construction of multi-source transfer learning model based on parameter transfer;(4)construction of a prediction model for ligands activity value based on ensemble learning.This thesis verify the effectiveness of our proposed method on a total of 60 GPCRs in 12 data sets.Four sets of homologous GPCRs are selected to assist in the establishment of the model for each GPCR.Finally,we chose two common evaluation indicators the correlation coefficient(R~2)and the root mean square error(RMSE)to evaluate the results of regression prediction.The experimental results show that the average R~2of MSTL-GNN is 0.511,and the average RMSE is0.580.Comparing with the weighted deep learning algorithm,we obtained the average increase of34.76%on R~2and the average decrease of 13.16%on RMSE;Comparing with the single-source transfer learning algorithm,we obtained the average increase of 17.26%on R~2and the average decrease of 8.71%on RMSE.In addition,we observed the performance of MSTL-GNN,weighted deep learning,single-source transfer learning,random forest,and support vector regression in term of the sample size.MSTL-GNN improves the prediction performance in the case of small sampling,and when the sample size is large the performance is also better than other several algorithms. |