| The use of artificial intelligence technology for small molecule design and new drug development is one of the hot research issues in the pharmaceutical field.In computational molecular science,the generative design and structure and property analysis of new drug molecules are very important issues.In application scenarios related to molecules,there are two issues that are of great practical value.The first problem is to predict the properties of molecules.By analyzing the structure and characteristics of a given molecule,predicting related properties,such as water solubility,drug-likeness,or affinity with special proteins,can greatly save the investment and cost of related assays.The second problem is the generation of molecules with specified properties.In actual drug design,people are limited to known data sets,and can only screen compounds with specified properties from a given data set.Through conditional generation models,potential molecules with specified properties can be generated to help accelerate drug discover.This paper proposes a molecular generation method based on motifs.By using the motif extraction algorithm,it is possible to extract the common motifs with certain chemical significance and stability in the molecule,and to establish the mapping relationship between the atoms and motifs on the drug molecule,and to convert the molecular graph into the motif graph.The problem of molecule generation is transformed into the new node and connection structure on the substructure graph.Compared with ordinary molecular graphs,motif graphs can improve the efficiency of generation,and at the same time alleviate the problem of combination explosion during the generation process.Compared with the atom-based generation method,this generation method has a natural advantage in ensuring the legality of drug molecules,because the motif itself is a reasonable existing structure.In addition,substructures can also provide sufficient chemical prior knowledge for molecules in graph representation learning.This paper also combines the above-mentioned generation method with the conditional variational autoencoder to realize a Drug molecule autoencoder based on motif.The model mainly includes graph encoding network,property prediction network and molecule generation network.The graph encoding network can convert drug molecules into fixed-length vectors,providing molecular representations for this model or other tasks.The property prediction network is responsible for predicting the specified properties of the drug molecule.The molecular generation network uses given properties and molecular representations to generate molecules that meet the specified properties,helping people to quickly narrow the scope of screening.Experiments show that this model meets expectations and performs well in the above tasks. |