| Machine learning is a branch of artificial intelligence(Artificial Intelligence,AI),which can simulate human learning behavior by computer to acquire new knowledge and skills,and can improve its own prediction performance by reorganizing the learned knowledge structure.Machine Learning methods have been applied in all aspects of new drug development,providing a series of tools for drug discovery,including target identification and verification,ADMET property prediction,reverse synthesis reaction prediction,prediction of protein structure,etc.The combination of new methods of artificial intelligence drug design and traditional computer-aided drug design is an important research direction.The work in this article can be divided into two parts:the first part explores the possibility of machine learning model in accelerating virtual screening and realizing virtual screening on ultra-large scale compound library;In the second part,homology modeling,molecular dynamics simulation and virtual screening were used to model and verify the 3D structure of protease activated receptor-4(PAR4),which is involved in the coagulation process and no crystal structures was reported,so as to provide a protein structure for understanding the physiological function of the target and drug design.Chapter 1:Introduction.It introduced the concept and development of machine learning and its application in drug research and development,focusing on the application of machine learning in target recognition and validation,ADMET property prediction,reverse synthesis reaction prediction,the prediction of protein structure and virtual screening;At the same time,the related methods of protein structure prediction and common homology modeling software were also introduced.Chapter 2:Construction of virtual screening model based on machine learning.To overcome the problems of slow screening speed and huge resource consumption of current virtual screening technology in ultra-large-scale compound library,based on the machine learning model,this paper uses 287216 cluster subsets of ChemDiv compound library as the data source of molecular docking compounds,and uses Vina and rdock docking program to conduct virtual screening.Then the training dataset is obtained by analyzing and labeling the docking results;Then the data is trained on the chemprop machine learning model to get the virtual screening model;Finally,the model is tested to verify the efficiency of the model.The results show that the machine learning virtual screening model constructed in this paper has a good virtual screening effect,and the screening speed is 120 times higher than the traditional virtual screening.Chapter 3:Homology modeling and structure verification of PAR4 protein.Protease activated receptor-4(PAR4)is an important target involved in the coagulation process.However,the 3D structure of PAR4 protein has not been successfully resolved,limiting the innovative drug research for this target.In order to construct a reasonable 3D structure model of PAR4,we first found the conserved residues by comparing several protein sequences of its homologous family,and selected the crystal structures of its homologous proteins PAR1 and PAR2 as templates,and used the homologous modeling software modeler-9.25 to model the structure of PAR4.The quality of the obtained protein model was evaluated and verified on many indicators.The best comprehensive model was further optimized on energy and conformation by using GROMACS,and the compound library containing active compounds and decoy compunds for the target was used for virtual screening to analyze the screening and enrichment efficiency of the model.Finally,a group of PAR4 protein structures A1,B3 and D3 with relatively reliable quality were obtained.The 3D model is used to analyze and summarize the possible interaction mode between antagonists and proteins,which provides a structural basis for understanding the physiological function of the target and drug design based on protein structure.Chapter4:Summary and Prospect.The work of this paper is summarized and the related application scenarios are prospected. |