Font Size: a A A

Scaffold-based Deep Generative Model For Drug Molecule Design

Posted on:2024-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:T X XuFull Text:PDF
GTID:2544307055972869Subject:Pharmacy
Abstract/Summary:PDF Full Text Request
Rational drug design can save manpower and material resources,reduce the cost of funds and time,and improve the efficiency of new drug development.The chemical space of compounds is extremely vast and complex,with about 1023to 1060synthesizable molecules,which is difficult for humans to explore.In recent years,with the enhancement of computer computing power and the open sharing of data,deep learning has rapidly developed in the field of drug design,there have been many deep generative models used for drug molecule design.However,the current deep generative models used for drug molecule design have the following shortcomings:(1)There are few scaffold-based molecular generation models in deep generative models,and random molecule generation models are mainly used.Random molecule generation models cannot generate new molecules based on specific scaffolds and have a narrow application scope.(2)For scaffold-based molecular generation models,these models only focus on whether the new molecule has a specified scaffold and ignore the scaffolds that have similar activity to the specified scaffold,lack scaffold generalization ability,and have insufficient control over the physicochemical properties of molecules.(3)The model ignores the stereochemical information of the molecules when building and selecting the training dataset,resulting in the model’s inability to control the stereochemical properties of the generated molecules.(4)The model requires code instructions to operate,which is complex and has a high usage threshold,making it difficult to use and promote the model.To address the aforementioned issues,this paper proposes a deep learning molecular generation model based on scaffold and considering molecular stereoisomerism,and design a user-friendly graphical user interface for it.The model is a conditional variational auto-encoder(CVAE)model with gated recurrent unit(GRU),composed of an encoder(inference network),prior network,and decoder(generation network).The model represents molecules using molecular graphs,using molecular scaffold and physicochemical properties as constraints,and generates new molecules by gradually adding atoms and bonds on the scaffold.The model can control various physicochemical properties of the generated molecules and has multi-objective optimization capabilities.Additionally,the model can learn the relationship between molecular structure and stereoisomerism,and control the chirality of molecular scaffolds.Moreover,the model can generate new molecules with similar three-dimensional structures and pharmacophore features as a given molecule,and has a three-dimensional shape generalization function.This model has broad application prospects in drug molecule design.Before model training,we first used the optimized Hier S scaffold extraction method to extract scaffolds from the largest online open natural product database COCONUT(COlle Ction of Open Nat Ural produc Ts).The molecules in the database were then encoded with stereochemical information and multiple physicochemical property labels were provided for each molecule and scaffold.During model training,the molecules were represented in the form of molecular graphs,with atoms and bonds encoded as nodes and edges in the molecular graph.The molecular scaffold and physicochemical properties were used as constraints,and the encoder,prior network,and decoder were used to calculate the reconstruction loss and Kullback-Leibler(KL)loss,which were continuously optimized through backpropagation.During molecule generation,the decoder sampled latent variables from the trained prior network,and generated new molecules by progressively adding atoms and bonds on the scaffold graph.The model also used the Real-time Ultrafast Shape Recognition With Pharmacophoric Constraints(USRCAT)algorithm with pharmacophore constraints to search for molecules similar to the given molecule in 3D structure and pharmacophore features from the entire dataset,and used them as scaffolds to generate new molecules with similar 3D shapes and pharmacophore features as the given molecule,achieving scaffold generalization.We extensively validated the performance of the model in terms of physicochemical property distribution,validity,uniqueness,novelty,diversity,property control,chirality control,and binding energy of generated molecules with target receptors.The validation results showed that the model had good learning and reproduction abilities for the overall physicochemical property distribution of the training set molecules.It demonstrated good validity,uniqueness,and novelty for any given scaffold,and the diversity of the generated molecules was better than that of the test set molecules.The model was capable of controlling multiple properties of the molecules simultaneously,generating molecules with property values close to the target,and could generalize the scaffold based on the 3D shape of the given scaffold to generate new molecules with similar activity.When the number of chiral atoms in the scaffold molecule was less than or equal to 3,the model could effectively control the chirality of the generated molecule scaffolds.
Keywords/Search Tags:deep learning, drug design, molecular generation model, stereochemistry, graphical user interface
PDF Full Text Request
Related items