Font Size: a A A

Research On QSAR Of Environmental Toxicant Based Data Mining

Posted on:2014-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q SuFull Text:PDF
GTID:1261330425483462Subject:Materials science
Abstract/Summary:PDF Full Text Request
Environmental toxicants are chemical contaminants that may harm livingorganisms at concentrations found in the environment. In recent years, One of thecurrent interests in environmental sciences and toxicology is the ranking of chemicalsubstances with respect to their potential hazardous effects on humans, wild life,aquatic flora and fauna. Quantitative structure activity relationships (QSAR) hasprovided a valuable approach in research on the toxicity of organic chemicals in suchstudies. Data mining technology is used to extract potential and useful informationfrom the databases, and is playing an increasingly important role in the study ofQSAR. In this paper, ensemble learning method and bayesian networks method wereused to investigate some topics of QSAR of environmental toxicant. The main workof the paper contains following four parts:Quantitative structure-activity relationship (QSAR) model was developed tocorrelate structures of56organic compounds with their toxicity to tadpoles (Ranajaponica). The68molecular descriptors derived solely from the structures of theorganic compounds were calculated using MODEl and Chemoffice. The descriptorswere screened by the minimum Redundancy Maximum Relevance (mRMR)-geneticalgorithm (GA)-support vector regression (SVR) method. The parameters of SVRmodel was optimized using particle swarm optimization method. The QSAR modelwas developed from a training set consisting of40compounds using SVR methodwith good determination coefficient (R2=0.95). The QSPR model was then testedusing an external test set consisting of16compounds with satisfactory externalpredictive ability (q2=0.90).110organic compounds with their toxicity to tadpoles (Rana temporaria) fromdifferent references were collected, and66descriptors were calculated with.HyperChem7.5and JChem for Excel. Firstly,4descriptors were selected by usingCFS (Correlation-based Feature Subset) method. Then SVR Ensemble learningapproach based on Bagging was employed to build the model. Finally, the performance of SVR Ensemble learning approach based on Bagging is compared withSVR Ensemble learning approach based on parameter. As a result, SVR Ensemblelearning approach performed better than SVR Ensemble learning approach based onBagging algorithm and molecular descriptors. It can be concluded that SVR Ensemblelearning approach has a potential to improve the performance of SARs analysis.A quantitative structure-activity relationship (QSAR) study was performed todevelop model for correlating the structures of581aromatic compounds with theiraquatic toxicity to Tetrahymena pyriformis. A set of68molecular descriptors derivedsolely from the structures of the aromatic compounds were calculated based onGaussian03, HyperChem7.5, and TSAR V3.3. A comprehensive feature selectionmethod, CFS (Correlation-based Feature Subset) method, was applied to select thebest descriptor subset in QSAR analysis. The SVR Ensemble Selection from Librariesof Models method was employed to model the toxicity potency from a training set of500compounds. The SVR Ensemble Selection from Libraries of Models was testedusing an external test set of81compounds. A good coefficient of determination(R2=0.79) and external predictive ability (q2=0.69) values were obtained indicatingthe potential of SVR Ensemble Selection from Libraries of Models in facilitating theprediction of toxicity.
Keywords/Search Tags:QSAR, environmental toxicants, Data mining, SVM, ensemblelearning, bayesian networks
PDF Full Text Request
Related items