Font Size: a A A

Application Of PLS And GA On QSAR Of Selected Organic Pollutants

Posted on:2007-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:G H DingFull Text:PDF
GTID:1101360182482404Subject:Environmental Engineering
Abstract/Summary:PDF Full Text Request
Quantitative structure-activity relationship (QSAR) of organic pollutants is of great importance to ecological risk assessments of organic compounds, pollution control and pollution prevention, etc. Partial least squares regression (PLS) is a main modeling method in QSAR studies of organic pollutants, for it can analyze data with strong multicollinearity. In order to obtain optimal QSAR models, different methods have been developed for variable selection. Among them, the methods based on genetic algorithm (GA) perform better for they have the ability of global search and optimization. However, every variable selection method has it's own disadvantages.In this study, some new variable selection methods were put forward for two typical QSARs with different sample sizes, the performance of these methods were compared and discussed, optimal QSAR models were established, and the optimal QSAR models obtained were analyzed and interpreted.1. Based on quantum chemical and topological descriptors, temperature-dependent predictive models for solid vapor pressure (Ps) and subcooled liquid vapor pressure (PL) of polychlorinated dibenzo-p-dioxins/dibenzofurans (PCDD/Fs) were developed using PLS.Three variable selection methods for QSAR studies with large sample were brought forward and used in the modeling. They were: (a) a method based on stepwise regression;(b) a method based on VIP (variable importance in projection of PLS models) values of predictor variables;and (c) a method based on Qcum2 (the cumulative variance of the dependent variable explained by the PLS components and determined by cross-validation) of PLS models and VIP. After comparisons and analyses, it was concluded that: the first method is not robust, and can not get better models because of the influence of multicollinearity between predictor variables;the second method always obtains locally optimal solution since VIP is not a good search criteria and the hunting zone of this method is limited;and the third method performs best among the three methods for it takes Qcum2 as the main search criteria and its hunting zone is sizable.The influence of entropic factor on vapor pressures (P) of PCDD/Fs was also investigated. It was concluded that the entropic factor is a key factor for Ps of PCDD/Fs, but not for PL of PCDD/Fs.Qcum2 of the final models, were both higher than 0.970, indicating that the models have good predictive ability and robustness, and could be used to estimate vapor pressures of PCDD/Fs at different temperatures. It was concluded from the optimal models that the mainfactors governing Ps of PCDD/Fs, from important to less important, are temperature, intermolecular dispersive interactions, entropic factor, intermolecular dipole-dipole and dipole-induced dipole interactions, and the main factors for Pl of PCDD/Fs are temperature and intermolecular dispersive interactions.2. Based on octanol/water partition coefficient (Xow) and some theoretical molecular structural descriptors, two QSAR models were developed for acute toxicity of photosynthetic process (PHS) inhibitors and acetolactate synthase (ALS) inhibitors to Chlorella Vulgaris by PLS.The best variable selection method obtained in the above QSAR studies was also used here. It was found that the optimal models determined by this method were often over-fitting, so it was not suitable for QSAR studies with small sample. Then a new variable selection method was put forward, which was based on forward selection method, Q^Cum of PLS model and VIP of predictor variables. As some modification was made, this method could get better QSAR models, and could be used for variable selection of QSAR studies with small sample.(E^cum of two optimal QSAR models for acute toxicity of PHS inhibitors and ALS inhibitors to Chlorella Vulgaris after variable selection, are all higher than 0.9, indicating that the models have good predictive ability and robustness, and could be used to predict acute toxicity of the relevant herbicides to Chlorella Vulgaris with the same modes of toxic actions. It was concluded from the optimal models that the main factors influencing the acute toxicity of PHS inhibitors to Chlorella Vulgaris, are the potential of electron transfer, hydrogen bond and the non-special intermolecular interactions, and the main factors influencing the acute toxicity of ALS inhibitors to Chlorella Vulgaris, are non-special intermolecular interactions and hydrogen bond.3. Three new evaluation parameters were brought forward by combining r2 (fitness of a model) with q (the predictive ability of a model), and one of them was selected and utilized as the fitness function of the GA-PLS algorithm, a variable selection method based on GA. The improved GA-PLS algorithm was applied to QSAR studies on P of PCDD/Fs and the acute toxicity of selected herbicides to Chlorella Vulgaris.Application of the improved GA-PLS algorithm on the two QSAR studies got some better models than those obtained by other variable selection methods, which indicated that the GA-PLS algorithm has better ability of variable selection, is robust and can be used for different QSARs. As GA has the ability of global search and optimization, the improved GA-PLS algorithm could find the optimal model at a certain condition. The improved GA-PLS algorithm provided multiple models simultaneously, which enabled researchers to discuss QSARs from different aspects.
Keywords/Search Tags:PLS, GA, Organic Pollutants, QSAR, Variable Selection
PDF Full Text Request
Related items