| Computer-aided drug design(CADD)technologies have been widely used in drug design and discovery due to their advantages of low cost and high efficiency.In recent years,with the continuous accumulation of the three-dimensional(3D)structures of a large number of biological macromolecules,the advantages of molecular docking in lead compound discovery and target prediction have become more and more obvious.However,the scoring functions used in existing molecular docking methods still have the defect of low accuracy.Therefore,how to optimize the scoring functions to improve the prediction accuracy of molecular docking-based target prediction and virtual screening is an important research topic in the CADD field.In this thesis,at first,we ystematically review the structure-based target prediction methods,including the small molecule and protein databases used in reverse molecular docking,the existing computational tools,and the trends of future development.Secondly,a variety of machine learning methods were used to develop personalized scoring functions based on molecular interaction energy terms and molecular fingerprints.The predictive ability of the developed scoring functions was evaluated based on the DUD-E and PDBbind datasets.The calculation results show that for the specific-target scoring functions for the 32 targets,the average area under the curve(AUC)for the DUD-E dataset is 0.973;for the generic scoring function,its Pearson Correlation coefficient for the core data set of PDBBind 2016 is 0.81.Compared with traditional scoring functions,the scoring functions based on machine learning has better prediction ability of virtual screening and binding affinity prediction.Finally,we developed an online computational platform,ASFP,which supports descriptor generation,personalized scoring function construction and virtual screening.The platform can use a series of computational tools to generate protein-ligand interaction energy terms and small molecule structure fingerprints automatically and efficiently,and adopt several machine learning algorithms to establish specific-target scoring functions.In addition,ASFP also provides the personalized scoring functions for 15 protein targets and a generic scoring function for binding affinity prediction.Users can employ these scoring functions for target prediction and virtual screening,which has good practical value and broad application prospects. |