Multi-view Molecular Property Prediction Based On Language Models

Posted on:2022-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2504306572960289

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Drug therapy plays a very important role in human life and health.However,the process of drug research and development is complex and long,which requires huge manpower and research and development funds.The high research and development cost ultimately affects the price of drugs and the choice of treatment plan for patients.Molecular property prediction is an important task in drug discovery,which can help researchers to find drug candidates,speed up drug development,and thus reduce the cost of drug development.At present,the application of deep learning technology in the field of drug research and development has made some achievements,and continuously improving the accuracy and reliability of molecular property prediction has become the main pursuit of researchers.This thesis mainly studies the prediction of molecular properties.In order to relieve the quantitative limitation of labeled compound data and extract efficient molecular representation,this thesis uses pre-trained language models to learn compound knowledge from a large-scale unlabeled compound corpus,and then transfers the learned knowledge to a small labeled data set.Specifically,the main research work of this thesis is divided into the following two parts.In order to encode the substructural features of molecules,a molecular fingerprint based molecular property prediction model(FP-BERT)is proposed,which uses stacked Transformer encoders to learn bidirectional molecular representations from a compound corpus.Each compound in the labeled dataset is represented as a set of molecular substructures,and the learned molecular representation is obtained by encoding the substructures in the molecular fingerprint using the pre-trained FP-BERT model.Then,the prediction model based on CNN is constructed for supervised learning.In order to construct a more comprehensive molecular representation,a multi-view molecular property prediction model,MV-Mol BERT,is proposed in this thesis,which integrates information among different molecular representations.MV-Mol BERT encodes each compound from the perspective of SMILES(Simplified Molecular Input Line Entry Specification)and molecular fingerprints respectively,and extracts highdimensional features with CNN.After that,the molecular representations of the two views are concatenated together as the multi-view molecular representation.Then,the neural network prediction model is constructed for supervised learning of molecular properties.The predictive performance of FP-BERT model and MV-Mol BERT model were evaluated on classified datasets(HIV)and regression datasets(ESOL,Freesolv,Lipophilicity,Malaria,CEP).The experimental results demonstrate the ability of FPBERT model to extract molecular fingerprint features.In addition,the multi-view prediction model MV-Mol BERT achieves better performance than FP-BERT.

Keywords/Search Tags:

molecular property prediction, molecular representation learning, deep learning, pre-trained language model, multi-view learning

PDF Full Text Request

Related items

1	Molecular Property Prediction Based On Deep Learning And Multi-Dimensional Encoding Molecular Information
2	Research On Key Technologies For Drug Molecule Recognition And Property Prediction Based On Deep Learning
3	Research On Molecular Property Prediction Model Based On Pseudo-twin Networ
4	Multi-view Deep Gaussian Process Model And Its Application In Pathological Diagnosis
5	Research On Molecular Property Prediction And Generation Technology Based On Machine Learning
6	Pathological Image Diagnosis Of Children’s Tumors Based On Deep Learning
7	Research On Drug Combination Property Prediction Methods Based On Deep Learning
8	Research On Molecular Property Prediction Methods Based On BERT
9	Research On Drug Discovery Methods Based On Multi-view Network Representation Learning
10	Named Entity Recognition Of Online Medical Consulting Texts Based On Deep Learning