| The progress of the pharmaceutical industry is the foundation for ensuring the health and well-being of the people,and drug innovation is one of the most important tasks in the pharmaceutical industry.According to statistics,the main reasons for the failure of drug development are the existence of toxic side effects of candidate drugs or insufficient biological activity of candidate drugs.The use of artificial intelligence in drug research can significantly improve the efficiency of drug development.In recent years,graph neural networks have become one of the most active directions in the field of artificial intelligence,and they have significant advantages in chemical molecule structure-related tasks.Therefore,this paper,from the perspectives of "early evaluation of toxicity" and "improvement of biological activity",applies the theory and methods of graph neural networks to study the toxicity prediction of drug-like small molecules and the design of active molecules from scratch.On the one hand,a well-performing molecular mutagenicity prediction model based on graph neural networks was constructed,and new mutagenicity structure alerts were discovered.Furthermore,this paper constructed a mutagenicity prediction model that can estimate the uncertainty of prediction results based on graph attention mechanism,thereby improving the usability of the model.On the other hand,targeting the main protease of the novel coronavirus,a target-specific drug-like small molecule from scratch design model was developed,and two structurally novel main protease inhibitors of the novel coronavirus were designed.The specific research contents of this paper are as follows:In the study of molecular toxicity prediction,this paper uses graph convolutional neural networks to construct a mutagenicity prediction model for drug-like small molecules.In this study,6,307 drug-like small molecules were collected from multiple databases and published literature as the training set,and another 1,383 molecules were collected as the validation set.After 500 rounds of parameter search using hyper-opt,the best parameter combination was determined and the final model(Mutagen Pred-GCNN)was trained using all molecules.The AUC and ACC of the model in five-fold crossvalidation reached 0.8782 and 0.8098,respectively,which is comparable or better than the current best model in commercial software.Deep learning has strong automatic feature extraction ability,and this study uses graph convolutional neural networks to directly learn and extract molecular representations from molecular graph data,called graph fingerprints(GFP).Traditional machine learning models were constructed based on GFP,and the support vector machine model achieved an AUC and ACC of 0.9526 and0.8874,respectively,in cross-validation,indicating that the model based on GFP can achieve better prediction performance than the current best model,and GFP extracts effective molecular structure information,which can be used as input for other traditional machine learning models to improve prediction performance.Furthermore,this paper used GFP to study mutagenicity-related toxic groups.The results showed that GFP not only contains known toxic groups but also can discover some new toxic groups(such as thiocarbamic acid),indicating that GFP has a certain interpretability and can provide new ideas for the study of the toxicity mechanism of drug-like small molecules.In this study,a mutagenicity prediction model for drug-like small molecules that can estimate predictive uncertainty is constructed based on graph attention mechanism.Existing deep learning models for toxicity prediction have complex architectures and different applicability and tasks.Therefore,it is necessary to understand under what circumstances the model can obtain reliable results and to what extent.To address this issue,this study uses multiple uncertainty methods to evaluate the predictive results of the model while improving its performance,demonstrating the important role of determining the reliability of model predictions in practical drug development.The mutagenicity prediction model(Trust_Ames Pred)based on graph attention mechanism in this chapter achieved an AUC and ACC of 0.8682 and 0.7987,respectively,in crossvalidation and an AUC and ACC of 0.8616 and 0.7941,respectively,in external validation,which is comparable to the current best model in terms of prediction performance.This study estimates the uncertainty of the model’s predictions and evaluates the model’s predictive results from the perspective of confidence.Four methods were used to evaluate the uncertainty of the model’s predictions and compared their ranking ability.In the five evaluation indicators of AUC,ACC,MCC,SEN,and SPC,the lower the uncertainty value of the molecule,the more reliable the model’s predictions and the more likely the prediction results are correct.Among them,the uncertainty estimation method based on bootstrapping has the largest area under the curve in ACC,MCC,and SEN,reaching 0.7825,0.6648,and 0.7324,respectively,indicating that the ranking ability of the uncertainty estimation method based on bootstrapping is the strongest and can be used as a reference index for molecule screening and ranking.In addition,to evaluate the applicability of the model,this study defined the application domain of the model based on the Euclidean distance of molecules in the latent space,that is,the model is applicable to the prediction of a molecule when the distance between the test molecule and the training molecules is less than 1.3773.To validate the prediction performance of the model and the ranking ability of the uncertainty estimation method,this study also conducted a case study based on newly approved molecular entities by the FDA in the past three years.The results showed that the model made correct predictions for 54 out of64 molecules in the domains,and the model correctly predicted all the molecules in the top 50% of the uncertainty values.In addition,the model was used to predict and rank the data in the HERB database,and the selected natural products were experimentally validated using the microfluctuation Ames test.The results showed that two out of three molecules had experimental results consistent with the model’s predictions.Finally,a free online server for mutagenicity prediction of drug-like small molecules is built based on this model for the use of drug development researchers.In this study,de novo design of target-specific drug-like small molecules was conducted in molecular activity research.De novo design of molecules can design molecules with expected physicochemical properties for specific targets.This method can reduce the cost and time of drug development and improve the success rate of drug development compared to direct large-scale screening.In this part of the study,a small molecule de novo design model was built based on long short-term memory networks.Using transfer learning,a target-specific de novo design model for drug-like small molecules was constructed for the SARS-Co V-2 main protease,and active drug-like molecules were designed.The properties of the designed molecules were computed to verify the generative performance of the model.500,000 molecules were designed using the model before and after transfer,and the validity,novelty,and uniqueness of the molecules designed by the model before transfer were 0.9973,0.9525,and 0.9996,respectively,while the validity,novelty,and uniqueness of the molecules designed by the model after transfer were 0.9248,0.9668,and 0.0652,respectively,indicating that both models can design structurally novel and reasonable drug-like small molecules.To perform ligand-based virtual screening of the designed molecules,four machine learning models and two graph neural networks were trained to classify the inhibitory activity of the molecules,and the six models were cross-validated and externally validated.In crossvalidation,the AUC,ACC,SEN,and SPC values of the machine learning models were0.8814-0.8930,0.7795-0.8305,0.7394-0.8084,and 0.6969-0.7800,respectively.In external validation,these models also showed comparable predictive performance.These results indicate that machine learning models can classify the inhibitory activity of druglike small molecules.A total of 6,963 molecules were designed using the target-specific de novo design model.Using molecular docking and machine learning,the molecules were subjected to structure-based and ligand-based virtual screening,and 3,938 potential inhibitors were obtained after screening.These molecules were then clustered,and the molecule with the highest docking score in each cluster was selected for further molecular dynamics simulations.Using molecular dynamics simulations,78 inhibitors were further screened and validated,ultimately resulting in two structurally novel and stable drug-like small molecules that can bind to the SARS-Co V-2 main protease.In summary,the research work completed in this paper includes:(1)construction of a mutagenicity prediction model for drug-like small molecules based on graph convolutional neural networks,and based on this model,it is demonstrated that graph molecular representation can effectively extract small molecule information and explore some new mutagenic toxicity groups(such as thioamide formic acid);(2)study of the mutagenicity of drug-like small molecules based on graph attention mechanisms and uncertainty estimation methods,which not only constructs a higher availability prediction model,but also clarifies the effectiveness of reasonable uncertainty estimation methods in model prediction,providing more reference indicators for molecule screening and ranking;(3)construction of a complete computer-aided de novo design strategy for druglike small molecules,providing two new potential inhibitors of the SARS-Co V-2 main protease,providing more novel lead compounds for the development of COVID-19 drugs. |