Property Classification Of Molecular Based On SMILES Expressions

Posted on:2024-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:P Xue

Full Text:PDF

GTID:2544307079991519

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Property classification of Molecular is the main problem in the field of molecular structure and property association modeling,and by establishing the association model between molecular structure and its corresponding physical and chemical properties,it can better mine molecular structure information and accelerate the development and application of molecules or drugs,thereby greatly reducing the research and development cost of new drugs.From the perspective of deep learning modeling,especially from the modeling of natural language processing,this thesis proposes a XLNet＿Bi LSTM＿CNN＿Attention molecular property classification model,which can effectively mine molecular structure information and improve the prediction accuracy of molecular property classification.Through a large number of experimental comparisons,the classification accuracy of the model is generally better than the molecular property classification model under the BERT pretraining condition,and its classification effect is improved by about 1% on average,and the accuracy index of the model is 81.16% and the F1 index is 79.61% on the four sets of datasets(unstructured SMILES text data).The ablation experiment further verifies that the fusion between the models can better improve the classification accuracy,and compared with the basic XLNet＿FC classification model,the model improves the accuracy index by 3.38%.In addition,by constructing a Gibbs-SVM control variable sampling model and extracting some similar molecules on dataset 1(structured molecule descriptor data),this batch of molecules is clustered.The experimental results show that,on the one hand,the Gibbs-SVM model constructed in this paper can better obtain the optimal variation range of the corresponding descriptors of molecules,and at the same time,there is a high similarity between the molecules screened in this range.On the other hand,under the conditions of deep learning model,the dimensionality reduction and clustering analysis of batch molecular word embedding vectors shows that it can also achieve a highly similar clustering effect as that of Gibbs-SVM model,which indicates that the molecular embedding feature vectors learned by the pre-trained model have good molecular feature representation ability,and can replace molecular descriptor data to a considerable extent to complete molecular clustering,classification and other tasks.At the same time,the experimental results of classification models under different word segmentation levels show that the experimental results of atomic-level word segmentation granularity are better than those at the functional group level,and the model accuracy index is improved by 4.6%,which indicates that the model can better capture the interaction between atoms.Therefore,different data granularity has different model learning difficulty for the same model.Based on the above,the proposed model has good experimental effects in molecular property classification tasks,and is suitable for molecular property classification tasks.

Keywords/Search Tags:

QSPRs, XLNet, Gibbs-SVM, Properties Classification of Molecular, NLP

PDF Full Text Request

Related items

1	Research On T Waves Alternans Detection Based On Gibbs Sampler And CCA
2	A New Algorithm Of Gibbs Artifacts Removing In MR Images
3	Research On Algorithm Of T-wave Detection Based On Morphological Guidance
4	Based On The Motif Of The Gibbs Sampling Algorithm To Find New Methods Of Research
5	Application Of Bayesian Methods In COVID-19 Event Warning Based On Gibbs Sampling
6	An Integrative Study On Molecular Classification And Molecular Networks Of Bladder Urothelial Carcinoma
7	Research On The Molecular Characteristics Of Chinese Herbal Medicine With Different Properties Based On Data Mining
8	New Approaches To Segmentation Of Brain MR Images Based On Gibbs Random Field Theory
9	Image Restoration And Medical Image Reconstruction Based On Generalized Fuzzy Gibbs Random Field
10	Characteristics Of Genetic Classification In Lung Adenocarcinomas And Itâ€™s Relation With Morphologic Classification