Font Size: a A A

Research On Molecular Property Prediction And Generation Technology Based On Machine Learning

Posted on:2024-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:L J LiFull Text:PDF
GTID:2544307079459484Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Molecular property prediction and molecule generation are key aspects of computeraided new drug discovery and development,which can be used to speed up drug development and decrease research costs.At present,most high-performance molecular property prediction models and molecular generation models are developed based on machine learning.However,existing methods are faced with several challenges and difficulties.In molecular property prediction tasks,most of the current models that perform well are implemented through deep learning.However,such models rely on large amounts of labeled data,and labeling molecular properties accurately is time-consuming and expensive.In order to improve the performance of the molecular prediction model with limited annotation cost,this thesis proposes a Pre-trained Variational Adversarial Active Learning method(PREVAIL)to screen molecules to be labeled.Unlike previous active learning methods based on a random sampling of the initial set,PREVAIL selects the most informative initial dataset by deep clustering methods,thus avoiding biases that affect the accuracy of the early decision process.In addition,PREVAIL uses task-aware variational adversarial active learning to merge the loss information from the molecular property prediction task into the latent space,which adapting both the distribution of molecules and the information from the prediction task.In molecular generation tasks,due to the complex compound structures and properties of molecules,and deep learning methods can efficiently extract complex features,so most existing molecular generation models are developed based on deep learning.However,these methods are faced with the problems of generation validity and semantic information of labels.Thus,this thesis proposes a Cross Adversarial Learning for Molecular Generation(CRAG)method,which combines the realism of variational auto-encoder based methods with the diversity of generative adversarial network based methods to further exploit the complex properties of molecules.Specifically,an adversarially regularized encoder-decoder is used in CRAG to transform molecules from simplified molecular input linear entry specification(SMILES)into discrete variables.Then,the discrete variables are trained to predict property and generate adversarial samples through projected gradient descent.In the conditional generation task for molecules,this thesis proposes a conditional generation model based on cross adversarial learning(Cross Adversarial Learning for Conditional Molecular Generation,CCRAG).In order to generate and optimize molecules with targeted properties,CCRAG extends the CRAG model with a predictor module that computes mutual information to separate the potential vectors of molecules from the property information.Both CRAG and CCRAG proposed in this thesis are trained by adversarial learning.PREVAIL,CRAG,and CCRAG proposed in this thesis have been extensively experimented on the QM9 dataset and the ZINC dataset.The experimental results have demonstrated the advancement and effectiveness of the proposed models.Therefore,the models presented in this thesis are anticipated to carry out the molecular design based on artificial intelligence in various chemical applications and promote the development of drug discovery,materials science,and related ones.
Keywords/Search Tags:Molecular Property Prediction, Molecular Generation, Deep Clustering, Active Learning, Adversarial Learning
PDF Full Text Request
Related items