Font Size: a A A

The Comparisons Of Different Methods In QSAR And Their Applications In Environmental Science

Posted on:2012-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:D X ChenFull Text:PDF
GTID:2131330335469553Subject:Chemical informatics
Abstract/Summary:PDF Full Text Request
With the accelerated process of urbanization and the rapid development of the economy, thousands of synthetic chemicals have been released into the environment. Environmental Risk Assessment is becoming increasingly important. With the development of computer science, statistics, physical organic chemistry, biology, etc., QSAR/QSPR has become an important and indispensable tool in Environmental Risk Assessment. Building an accurate and efficient QSAR/QSPR not only can predict the migration/transformation behavior of organic pollutants quantitatively, but also can avoid the time lag for experimental studies. In addition, it is the basis and premise for pollution prevention. Therefore, it is of significant important either from the theoretical or practical aspect.This thesis is focused on persistent organic pollutants and typical organic pollutants, comparing different QSAR/QSPR models. The following studies were carried out:The Environmental Risk Assessment, QSPR as well as the progress of them in the study of organic pollutants were briefly described in chapter 1.Chapter 2:Quantitative structure property relationship (QSPR) analyses were performed on 64 persistent organic pollutants (POPs) for modeling their soil sorption coefficient (Koc). Three machine learning methods, genetic algorithm-multiple linear regression (GA-MLR), least-squares support vector machine (LSSVM) and local lazy regression (LLR) were used to develop QSPR models. The obtained model based on LLR is better than the other two. The model of LLR leads to a correlation coefficient (R2) of 0.894 for the training set and 0.860 for the test set. In addition, the leave-one-out cross-validation R2 is 0.860. These results indicate that the proposed approaches have a good prediction ability and robustness and they can be successfully used as a general tool to estimate the Koc of other persistent organic pollutants.Chapter 3:Using the accurate molecular structure obtained by the B3LYP/6-311+G(d,p), an accurate and validated QSPR model was developed for the organic carbon adsorption coefficient of 70 PCBs. Genetic algorithm (GA) was used to choose the descriptors and then use multi-linear regression(MLR),least-squares support vector machine (LSSVM) to build linear and nonlinear QSPR models. we found that:the model based on DFT optimized geometry is better than that based on semi-empirical optimized geometry.Chapter 4:Using GA to select descriptors and LSSVM to build models, a QSPR model for Henry's law constant of 96 heterogeneous organic pesticides was build. Good results was obtained, for the training, R2=0.785, Q2=0.637, RMSE=1.010. For the test, R2=0.734, RMSE=1.171. The results show that GA-LSSVM is a potential method to select descriptors and build models.
Keywords/Search Tags:QSAR/QSPR, Genetic Algorithm-Least Squares Support Vector Machine (GA-LSSVM), local lazy regression (LLR), organic pollutants
PDF Full Text Request
Related items