Font Size: a A A

Study On Modeling Of Solvent Property Prediction And Risk Assessment Based On Artificial Intelligence

Posted on:2021-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y SuFull Text:PDF
GTID:1481306107490714Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
Physicochemical properties are significant in the development and design of solvents and separation processes,and the evaluation of the potential safety,health and environmental(SH&E)risks of solvents is important to achieve green separation.However,it is time-consuming or money-consuming to measure various properties for a new chemical in experiments.On the other hand,the classic models of quantitative structure-property relationship(QSPR)predict various properties of solvents based on linear mathematics,which are not suitable for the classification tasks in fuzzy evaluation of potential SH&E risk of solvents.In order to improve the efficiency of solvent and separation process development,it is of great significance to establish the QSPR models between molecular structures and properties based on the existing experimental property data,to develop intelligent property prediction models and risk assessment models,and to implement the virtual screening of solvents in the large chemical space.In this paper,we propose a framework to develop QSPR models using the intelligent technologies involving deep learning and several machine learning algorithms.The intelligent correlation has been achieved between molecular structure feature vectors and target properties.The common-used properties in solvent design and screening,such as critical properties,flammability and octanol-water partition coefficient,are used to build predictive models based on the deep learning techniques.Based on machine learning algorithm and molecular fingerprint,a strategy is proposed to evaluate the potential SH&E risks of solvents.The various models based on different learning algorithms and fingerprints are evaluated.In general,this study mainly includes the following works:(1)A novel deep-learning-based strategy is proposed to realize the automatical identification and vectorization of molecular structures for computers,in which there is no need to calculate any numerical molecular descriptors.For this,we study the canonizing algorithms of molecular structure and develop a RDKit-based translation software for molecular graphs.This program can canonize and convert the molecular structure into a tree-like data structure stored as non-cyclic directed graphs(DAGs).A simple method is also created to combine the symbols of chemical bonds and atoms into substrings representing molecular substructures.A common-used algorithm,the word embedding,is used to vectorize each vertex marked by the substrings in a DAG,which enable the DAG to be learned for computers.A type of graph neural networks,Tree-structured long-short term memory(Tree-LSTM)network,is used to traverse a DAG,mimic its topology and vectorize the DAG.(2)An architecture of deep learning is designed,in which the Tree-LSTM network cooperated with feed-forward neural networks(FNNs)forms a deep neural network(DNN)to correlate the feature vectors with target properties.In order to obtain a model with better performance,various methods of data pre-processing,data analysis and outlier recognition are studied,a workflow of QSPR modeling based on deep learning including data collection,data transformation,and model training and validation is proposed,and key technologies are elucidated.(3)On the basis of the proposed QSPR modeling framework,intelligent predictive models for critical properties of solvents are established.These models correlate three critical properties(critical temperature:Tc,critical pressure:Pc and critical volume:Vc)of nearly 1,400 substances respectively.The models are compared with classical group contribution(GC)methods.The mean absolute errors(MAEs)of the obtained models on the training set for the three critical properties are 22.48 K(Tc),1.34 bar(Pc),7.10cm3/mol(Vc),and the mean relative errors(MREs)are 4.23%(Tc),3.81%(Pc),1.97%(Vc).The MAEs obtained on the test set are 23.77 K(Tc),3.18 bar(Pc),19.92cm3/mol(Vc),while the MREs are 5.29%(Tc),Pc:8.29%(Pc),Vc:6.15%(Vc).Compared with the Joback and Constantinou-Gani GC models,the proposed deep learning models show better accuracy and higher resolution of isomer.(4)A multi-task learning architecture is also developed on the basis of the Tree-LSTM network and FNNs,which can simultaneously correlate and predict various properties.Four hazardous properties in terms of flammability including flash point temperature(FPT),auto-ignition temperature(AIT),lower flammable limit(LFL)and upper flammable limit(UFL)are used to develop the multi-task intelligent predictive model.Two strategies,joint training and alternative training,are both used to train the multi-task DNN(MDNN)to achieve more rapid convergence,and the MDNN model shows good predictive performance in the test set.The model is trained on more compounds than the previous models and achieves the simultaneous prediction of four hazardous properties.(5)The models are developed to rapidly evaluate on the potential SH&E risks of solvents based on machine learning algorithm and molecular fingerprint.According to the CHEM21 solvent selection guide,potential SH&E risks of 1082 substances are scored to form the data set of samples.And then three machine learning algorithms are used to correlate these scores with five molecular fingerprints respectively.Finally,the model combined RDKit fingerprint(RDKFP)with random forest(RF)classifier shows the best accuracy than other combinations of fingerprints and algorithms.The values of classification performance indicator(F1 scores)are 0.70,0.61 and 0.71 for safety,health and environment scores on test sets separately.It can be suggested that the SH&E high-risk solvents are identified by the learned model based on the molecular structure.The model can evaluate the SH&E potential risk of solvents quickly at the initial stage of development with the concrete values of related properties.
Keywords/Search Tags:Deep learning, Property prediction, Machine learning, QSPR, Risk assessment
PDF Full Text Request
Related items