Font Size: a A A

Development And Application Studies Of Matched Molecule Pair Analysis And Active Learning In Drug Design

Posted on:2022-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y DingFull Text:PDF
GTID:1484306482996749Subject:Drug design
Abstract/Summary:PDF Full Text Request
Iterative molecule design is a process of directed evolution in drug discovery.The classic design cycle consists of multi-rounds of design,synthesis,testing,and analysis.The process is laborious,time-consuming and the cost is very high.In recent years,the development of computing resources and drug design methodology provides more new thoughts to accelerate this complex process.This thesis is mainly focused on development and application of new methodologies,such as matched molecular pair analysis(MMPA)or active learning(AL),for helping tackle the tough scientific problems in drug design,including bioactivity prediction in lead optimization,the improvement of estimating ADME/T properties and the study of molecular mechanism of action(MOA).This dissertation is mainly constituted by three parts.The first section is to develop bioactivity quantitative prediction model based on matched molecular pair and matched molecular series methods(chapter 2).The second section is the development and application of active learning methods to improve the ADME/T(Absorption,distribution,metabolism,excretion,and toxicity)properties,taking the plasma exposure of orally administered drugs(AUCpo)as a case study(chapter 3).The third section is to use matched molecular pair analysis,crystal structure analysis and molecular dynamics simulation and and other methods to study the MOA of retinoic acid-related orphan receptor gamma-t(ROR?t)functionally switchable modulators(chapter 4).Enhancing a compound's biological activity is the crucial task for lead optimization in small molecule drug discovery.However,it is time-consuming and laborious to conduct multiple iterative rounds of compound synthesis and bioactivity tests.To address the issue,it is highly demanding to develop high-quality in silico bioactivity prediction methods,to prioritize such more active compound derivatives and reduce the trial-and-error process.In chapter 2,based on the large-scale structure-activity relationship(SAR)data in Ch EMBL database,we constructed two kinds of bioactivity prediction models.The first one is based on the similarity of substituents,which is realized by MMPA,including SA,SA?BR,SR,and SR?BR.The second one is based on SAR transferability,which is realized by matched molecular series analysis,including single MMS pair,full MMS series,and multi single MMS pairs.Moreover,we also used the distance-based threshold to define the application domain of models.Among the above seven individual models,Multi single MMS pairs bioactivity prediction model produced the best performance(R2=0.828,MAE=0.406,RMSE=0.591),and the baseline model(SA)showed the worst prediction accuracy(R2=0.798,MAE=0.446,RMSE=0.637).Finally,an accurate prediction model for bioactivity was built by consensus modeling(R2=0.842,MAE=0.397,and RMSE=0.563),which was superior to all individual models.This study is expected to provide a valuable tool for medicinal chemists to design rational analogs with high affinity in lead optimization.Artificial intelligence(AI)technology is playing an increasingly important role in drug discovery research.However,the success of AI models has been limited by the requirement for large amounts of annotated training data,which is the opposite of the situation where current drug discovery pipelines often strive to characterize as few compounds as possible.Recently,the concept of AL has received attention of the community because it requires minimal training data for training and updating AI models.In chapter 3,we examined a low data case scenario detailing different AL schemes on modeling the plasma exposure of orally administered drugs,one of the key pharmacokinetic parameters for drug candidate evaluation.Entropy-based uncertainty metrics produced more powerful predictive models without exhaustive experimentation and learned them much faster than selecting experiments by using other strategies,even reducing the amount of data annotation by more than 70%.Therefore,an entropy sampling strategy was adopted to navigate in a large chemical accessible space for experimental testing and feeding back into the model.Two rounds of adaptive screening and model retraining were carried out,with 10 new experiments added in each round.The results indicated that the performance in plasma exposure prediction can be essentially improved by only adding dozens of experiment data points.For the first time,we experimentally verified the potential of AL in addressing the low data issues of drug discovery with a close to real-world application.This study is expected to provide insights for improving AI models with better accuracy and generalization capability,and is also of reference value for those planning to implement AL workflows in their drug discovery pipelines.Small molecule ROR?t inverse agonists and agonists have potential for various autoimmune diseases and cancer treatment.Although some modulators with similar structures but different functional types have been found,their detailed molecular mechanisms remain to be studied.In chapter 4,we firstly performed MMPA on currently reported ROR?t ligands and found the functional switch phenomena:one pair of structurally similar molecules which exhibit opposite MOA(MOA Cliff).Although there are two change patterns,such as“short”inverse agonist to agonist or agonist to“long”inverse agonist,two functional reversals of“short”inverse agonist to agonist,then to“long”inverse agonist in one scaffold have not been found.Then,through the cooperation with experimental group,the crystal structure of carbazole-based inverse agonist 6-ROR?t complex was determined,and a series of modulators were obtained guided by structure-based rational drug design.Next,the cocrystal structures of ROR?t complexed with the agonist 7d and the"long"inverse agonist 7h were revealed by X-ray analysis.Finally,to further reveal the molecular mechanisms of different ROR?t modulators in-depth,we performed molecular dynamics simulations of 1 microsecond on these three representative complexes.We found that in agonist-bound system,agonist 7d could stabilize Y502-H479 hydrogen bond and then stabilize AF2 region,which was consistent with previous reports.The analysis of side-chain dihedral angles of H479 and Y502 showed that these two residues flipped toward opposite directions when introducing“short”inverse agonist 6 or“long”inverse agonist 7h.And,by monitoring the RMSDs of H11-H11'-H12 and spiral internal hydrogen bond distances,we proposed the two models to describe the different behaviors.The“short”inverse agonist could destabilize H11'and dislocate H12.Although coactivators could not be recruited,it could recruit corepressors since its helix structures of H12 were complete;while the“long”inverse agonists could separate H11 and unwind H12,which would thus prevent cofactor peptide recruitment.The models explain the differences of cofactor peptide recruitment among different ligands,which may help identify novel small molecules with different regulatory mechanisms.
Keywords/Search Tags:Matched Molecular Pair Analysis, Bioactivity Prediction, Active Learning, Plasma Exposure, ROR?t
PDF Full Text Request
Related items