Font Size: a A A

The prediction of physical properties and biological activities of organic compounds from their molecular structures

Posted on:2004-05-17Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:McElroy, Nathan RichardFull Text:PDF
GTID:2461390011472393Subject:Chemistry
Abstract/Summary:
This thesis focuses upon the methodology of quantitative structure-property relationships (QSPRs) and quantitative structure-activity relationships (QSARs), and the results of four studies that implement that methodology.{09}The goal of this work is to create predictive models that will link the molecular structures of sets of organic compounds to their physical properties and/or biological activities. Quantitative approaches build models to predict values over a continuous range, while classification approaches build models to categorize compounds into one of two classes.; Chapter 1 contains a brief history of chemometrics as it applies to the research within the thesis. Chapter 2 introduces group methodology for creating quantitative structure-activity and structure-property relationships. This includes molecular structure representation and modeling, molecular descriptor generation, objective and subjective feature selection, linear and non-linear model formation, and model validation. Chapter 3 describes group methodology used to create classification models in QSAR. Discussions include k-nearest neighbor, linear discriminant analysis, and classification trees as classification algorithms. Data set similarity measures are also introduced by way of atom-pair descriptors.; Chapters 4 through 7 contain results and discussion of four applications of quantitative and classification methodology toward physical property and biological activity prediction. Chapter 4 describes the prediction of aqueous solubility values for three sets of organic compounds. Quantitative models for an oxygen-containing compound data set, a nitrogen-containing compound data set, and a combined data set are presented. Chapter 5 presents a quantitative predictive model for a set of alkylurea compounds that inhibit the soluble epoxide hydrolase enzyme in humans, and two classification models for alkylurea compounds that inhibit the soluble epoxide hydrolase enzyme in humans and mice. Chapter 6 describes a classification approach for predicting the clastogenic potential of organic compounds toward Chinese hamster lung and ovary fibroblasts. Three different data sets, created by a similarity measure, are modeled to show that classification rates improve with smaller, more similar data set compounds. Chapter 7 describes classification models to predict responses of organic compounds in an Ames test using the TA100 strain of Salmonella typhimurium. Four data sets containing 77 positive response compounds and 154 different negative response compounds are modeled.
Keywords/Search Tags:Compounds, Data set, Quantitative, Molecular, Methodology, Classification, Physical, Biological
Related items