Font Size: a A A

Data-Driven Design And Analysis Of Chemical Space For Drug Discovery

Posted on:2022-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:1484306482996759Subject:Drug design
Abstract/Summary:PDF Full Text Request
Chemical space,which can be viewed as being analogous to the universe in its vastness,encompasses all possible small organic molecules.In the field of drug discovery and development,researchers have been working hard to find drugs from chemical space.However,it is well known that drug research is a challenging,time-consuming,high cost and high risk task,because whether a compound can be developed to a drug depends on too many factors.Retrospective analysis the discovery process of approved drugs,it can be found that a lot of drugs are initially obtained either from the available chemical space(available compound libraries)or from the virtual chemical space(virtual compound libraries).The key difference between two libraries lies in whether compounds in them have been synthesized.Screening compound libraries,fragment compound libraries and natural product libraries are some common available compound libraries while those generated by enumeration algorithms or bioisosteric replacement tools are common virtual compound libraries.Chemical space is so large that both available compound libraries and virtual compound libraries are only a small subset of it.It is impossible to explore entire chemical space,however,it is tractable for us to design target-specific subset by using the data relating to the target and this might accelerate hits discovery.In the first chapter,we separately investigated the most cutting-edge technologies in available and virtual compound libraries,named DNA-encoded compound library technology(DELT)and generative models.DELT is a technology to construct collections of compounds covalently linked to unique DNA tags,which makes it possible to screen a large pool of library members without preparing each pure compound,and gnerative models are kinds of deep neural networks,which can convert discrete molecular representations to or from a continuous representation.Both of two technology can be used as tools for navigating chemical space.Surely,as two advanced technologies,DELT and generative models still have a lot to improve.The quality of a DEL plays an important role in the success of subsequent screening experiments,and quality includes high conversion rate for each building block(BB)used during library synthesis.However,a significant percentage of BBs,which are picked by blind pick methods,are not appropriate for library construction due to their poor conversion rate.To solve this problem,a machine learning method was applied to assist in the selection of BBs on a DNA-compatible Pictet-Spengler reaction in the second chapter.The results showed that the machine learning method had a better performance to find high-conversion-rate BBs compared with a blind pick method or a random pick method(Hit rate were 79.4%,18.4%and 11.8%seperately),which exemplifies the value of machine learning methods to ensure the quality of DELs as well as reducing the cost of for DEL construction.In the third chapter,in order to reduce the difficulty of DEL design and data analysis caused by lack of appropriate softwares,several tools were specially developed for DELT,including a DEL enumeration algorithm,a decoding algorithm,and a data analysis algorithm.The results showed that the DEL enumeration algorithm could be used for DEL design effectively.In addition,we found that screening result files could be quickly decoded by the decoding algorithm,and then intuitively depicted by the data analysis algorithm.These tools are expected to greatly simplify the difficulty of DEL design and data analysis,which will significantly increase the efficiency of DELT.In addition,we tested DELT on a target BRD4-BD1.We firstly enumerated two candidate DELs,and then chosen a DNA-encoded benzimidazole library for following screening according to the target information of BRD4-BD1.Finally,we got an inhibitor(IC50=229.7 nM)of BRD4-BD1.High similarity scores between reported active ligands and those designed by generative models is observed in some research.In order to make the designed hits as novel as possible,we developed a physics-based generative model and tested it on a target ALK5 in the fourth chapter.The results showed that the generative model implicitly learned the pocket information of ALK5 and some high-scoring compounds could be sampled from chemical space.Finally,we selected the compound DC-ALK5003 for validation and found that the compound was a inhibitor of ALK5(IC50=3.3 μM),which is not similar to any reported ALK5 inhibitors.Though generative models have gratified our imagination to search unknown chemical space without relying on brute-force exploration,however,the molecules need to be synthesized and biological evaluated,and the trial-and-error process is still a resources intensive endeavor.Therefore,AI-based drug design methods face a major challenge of how to prioritize the molecular structures with potential for subsequent drug development.In order to solve this problem,we developed a molecular filtering method,MolFilterGAN,based on adversarial generative model.The results showed that MolFilterGAN outperformed conventional drug-likeness or synthetic ability metrics such as QED,SA,Fsp3,MCE-1 and BNN.In addition,we found that MolFilterGAN significantly increased the efficiency of molecular triaging in the real word.Further evaluation of MolFilterGAN on LIT-PCBA,a high-throughput screening(HTS)bioassay dataset,suggested that MolFilterGAN may have learned a capacity for prioritizing bioactive molecules from general accessible molecules,although none of the molecular target information has been included when training MolFilterGAN.In summary,we discussed the two cutting-edge technologies DELT and generative model algorithms from the perspective of chemical space.The results shows that it is possible to quickly find hits for some targets by navigating available chemical libraries or virtual chemical libraries,which are carefully designed based on the target data.This advocates that the rational use of data can significantly accelerate the process of drug development and data-driven drug design methods are bound to play an increasingly important role in drug development with the advancement of technologies.
Keywords/Search Tags:Chemical Space, Drug Screening, DELT, Generative Model, Artificial Intelligence
PDF Full Text Request
Related items