Font Size: a A A

Research On Molecular Generation Methods Based On Autoencoders

Posted on:2022-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2511306320966639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has been more and more applied to various fields,and the intersection between computer and other disciplines is increasing,and molecular generation is an important part of it.The molecular space is extremely large,and the known molecules are only a small part of it.In order to find more potentially new molecules that can be used,researchers in recent years have tried to use deep learning methods to generate new molecules,and proposed a large number of molecular generative models.These molecular generative models usually require the input and output to be the same type,that is,if the input is a string,the output is a string of the same significance,if the input is an adjacency matrix,the output is an adjacency matrix,and so on.To facilitate the processing of molecular data by computer,molecules are usually expressed and stored in the form of Simplified Molecular Input Line Entry Specification(SMILES),which represents a molecule as a sequence of characters.Existing works directly use SMILES strings,or convert them into graph structures such as adjacency matrix as model input,but most of them are required to be the same as the type of the input and output without considering the relationship between the molecules of all kinds of characteristics,such as molecular scaffolds and branched chains,as well as atomic distribution and molecular structures.In order to improve this aspect,we propose three methods to generate molecules by establishing a relationship between input and output.Firstly,we present a sequence-based molecular generative model Core2Chains(Core-to-Chains).We use SMILES to represent molecules and divide them into two parts: a molecular core and a molecular chains.Taking the molecular core as input and the molecular chains as output,a molecule can be obtained by combining the input and output.At the same time,in order to improve the diversity of generated molecules,we introduce Gaussian noise into the hidden space of the model,so that different chains can be generated from the same core.Secondly,we propose a molecule generative model based on graph structure named A2 Str.Intuitively,a molecule can be expressed as a graph structure.We can regard atoms as nodes and chemical bonds as edges.We usually express the nodes in a graph and the connections between nodes in the form of adjacency matrix,which can represent a complete graph structure in general.However,there are many types of nodes(atoms)and edges(chemical bonds)in the molecular graph,and an adjacency matrix cannot contain all the information of the graph structure.We can be divided into two parts,one part is node feature matrix that is dedicated to represent node type,the other part is edge feature matrix that is dedicated to represent the connection of the nodes.Taking the former as the model input and the latter as the model output,the combination of the two can represent all the information of a molecular graph and build the relation between nodes and edges.Finally,we propose a molecular generative model based on molecular fingerprints.Molecular fingerprints can represent the substructures existing in molecules and the connections between the substructures,which are generally expressed in the form of 0/1bitstream.Obviously,a molecule can correspond to a unique molecular fingerprint,but a molecular fingerprint cannot correspond to a unique molecule,we want to give a molecular fingerprint,that is,we know some substructures and their connections,so that we can infer the real structure of the molecule.
Keywords/Search Tags:Molecular generation, Autoencoder, SMILES string, Molecular graph, Molecular fingerprint
PDF Full Text Request
Related items