| In recent decades,with the increasing demand for better quality of life and growing concern for health of people,drug development has become increasingly important.Molecular property prediction is one of the fundamental and essential tasks in the field of drug development,as many downstream applications rely on it to evaluate,select,and generate molecules.With the continuous development of artificial intelligence technology,various types of machine learning algorithms have been proposed and widely applied in the field of drug development,demonstrating enormous potential for application.The representation of molecules in computers is roughly divided into sequence and molecular graph data structures,with the latter further divided into two-dimensional and three-dimensional molecular graphs.Faced with each different molecular input,numerous machine learning algorithms have emerged with their own advantages,disadvantages,and focuses.Currently,most methods can only analyze single molecular representations,unable to comprehensively consider the various information of molecules.In this article,three methods based on siamese neural networks are proposed,combining different molecular representation types and neural network models to analyze the structure,properties,and other information of molecules from multiple perspectives,to improve the accuracy of molecular property prediction.Firstly,we propose a novel model called PSGS(Pseudo-Siamese Graph and Sequence network)based on Simplified Molecular Input Line Entry System(SMILES)sequences and two-dimensional molecular graph representations.Our model utilizes a pseudo-siamese neural network to calculate the similarity between graph and sequence representations,with the aim of improving model generalization by treating molecular consistency as an additional self-supervised task.Furthermore,we use a fusion layer to combine the representations of sequences and molecular graphs to further improve the accuracy of molecular property prediction by combining different models.Secondly,we propose a pseudo-siamese neural network model based on 2D and 3D molecular graph representations,aimed at integrating local atom and chemical bond information from different molecular representations.A molecule can be naturally represented as a graph structure,with nodes representing atoms and edges representing chemical bonds between them.Different models learn different information from the atoms and bonds.The traditional two-dimensional molecular graph only encodes the topological information of the molecule,but ignores the geometric information of the molecule,which is contained in the three-dimensional spatial structure of the molecule.We use a 3D geometric-based graph neural network architecture to model the atoms,bonds,and bond angles simultaneously,and use the pseudo-siamese network to constrain and optimize the representations of atoms and chemical bonds.Finally,to comprehensively learn molecular representations for predicting molecular properties,we propose a pseudo-siamese neural network model that combines sequence,2D molecular graph,and 3D molecular graph representations.The molecular representations learned by different networks should be consistent,which means that different representations of the same molecule should be more similar than representations between different molecules Our model uses a pseudo-siamese network that considers both the global consistency between molecules and the local consistency between atoms,and combines them to capture the properties of molecules more comprehensively. |