| The COVID-19 outbreak and the persistence of chronic hepatitis B suggest that outbreaks of acute viral infections and the growing number of chronic viral infections remain major threats to public health security.According to statistics,there are more than 200 viruses that can infect humans,but currently only 10 viruses have specific therapeutic drugs,and many other highly infectious and pathogenic viruses have not yet found effective therapeutic drugs.Given the high genetic variability of viruses and the single mechanism of existing antiviral drugs,the continuous emergence of viral resistance.There is an urgent need to discover novel antiviral compounds.This is not only in response to the current crisis of viral infections but also to prepare for the outbreak of new or recurrent viral infectious diseases.In recent years,machine learning methods have shown great potential in drug discovery,significantly improving the efficiency of drug research and development.In this paper,the research method of computer-aided drug screening was mainly used.Taking SARS-CoV-2 and HBV as examples,cheminformatics combined with machine learning methods were used to develop classification models to predict the antiviral activity of compounds.The models were then used to screen herbal-derived compounds and FDA-approved drugs to discover novel antiviral compounds and medicinal plants.The main content of the paper includes:The first chapter of this paper reviews the current status and challenges of antiviral drugs and the feasibility analysis of screening novel antiviral compounds based on cheminformatics and machine learning approaches.In the second chapter,herbs-derived compounds were screened to identify active anti-SARS-CoV-2 compounds and medicinal plants by constructing classification models.Firstly,we first constructed a benchmark dataset from anti-SARS-CoV-2bioactivity data collected from the Ch EMBL database.And the benchmark dataset was used for multiple machine learning methods to build multiple classification models.By model comparison,RF and SVM models achieved satisfactory predictive performance with mean AUC values of 0.90.A total of 1011 active anti-SARS-CoV-2 compounds were predicted from the TCMSP using two models,and there are six compounds have been confirmed to have antiviral activity in previous studies.Through molecular fingerprint similarity analysis,24 compounds have high similarity with FDA-approved antiviral drugs,indicating that most compounds have novel structures.Meanwhile,through molecular docking analysis,15 compounds could specifically dock to 3CLpro protein,221 compounds could specifically dock to Rd Rp protein,and 278 compounds could dock to both 3CLpro and Rd Rp proteins.Based on the predicted anti-SARS-CoV-2 compounds,we identified 74 anti-SARS-CoV-2 medicinal plants through enrichment analysis,such as licorice,Epimrdii Herba,and Scutellariae Radix.The 74 plants are widely distributed in 68 genera and 43 families,14 of which belong to antipyretic detoxicate plants.In the third chapter,existing FDA drugs were screened to identify active anti-HBV compounds for experimental testing by constructing classification models.Similarly,we collected anti-HBV bioactivity data from the Ch EMBL database and constructed a benchmark dataset.After hyperparameter optimization and model evaluation,we constructed the optimal RF and SVM models,with an average AUC of 0.98 and 0.97,respectively.Then,two models were used to screen existing FDA drugs.A total of 984 drugs were predicted to have anti-HBV activity by both models.Finally,five drugs with novel structures and high prediction probabilities in both models were selected for the HBV inhibitory activity test on Hep G2.2.15 cells.The results showed that these five drugs had a certain inhibitory effect on HBV,especially on the HBs Ag,with an inhibitory rate of over 50%.It shows that our workflow is also applicable to other viruses to construct prediction models for screening potentially active compounds.In summary,taking SARS-CoV-2 and HBV as examples.Based on cheminformatics combined with machine learning methods,we developed classification models to predict the antiviral activity of compounds.This study not only provides an attractive starting point and a broader scope for mining antiviral medicinal plants and exploring antiviral drugs but also accelerates the in-depth analysis of antiviral drug discovery from herbal medicines.It also provides new ideas for predicting antiviral drugs with novel structures using already approved drugs. |