Font Size: a A A

Prediction of chemical properties and biological activities of organic compounds from molecular structure and use of probabilistic and generalized regression neural networks

Posted on:2004-01-05Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Mosier, Philip DFull Text:PDF
GTID:2461390011977090Subject:Chemistry
Abstract/Summary:
This thesis describes the development of and methodology used to obtain quantitative structure-activity relationships (QSAR) for several different sets of compounds. QSAR models provide statistical and often meaningful and interpretable relationships between the physical characteristics of molecules and their observed activities. The QSAR model building process used to develop the models presented in this thesis are described. Aspects of molecular representation and modeling are discussed. This is followed by a discussion of the ways in which various aspects of molecular structure may be encoded through the use of topological, geometric, electronic and polar surface area descriptors. The process of selecting pertinent descriptor subsets using the stochastic optimization methods of genetic algorithms (GA) and generalized simulated annealing (GSA) is outlined. The GA and GSA are used with multiple linear regression (MLR), computational neural networks (CNN) or generalized regression neural networks (GRNN) to find high-quality quantitative models, and with linear discriminant analysis (LDA), k-nearest neighbors analysis (k-NN), and probabilistic neural networks (PNN) to find high-quality classification models. Each model presented is validated using a set of compounds that was not used to build the models.; The theory of the PNN and its close relative, the GRNN, are discussed in detail. Effective PNN models are presented that identify molecules as potential human soluble epoxide hydrolase inhibitors using a binary classification scheme. A GRNN model is presented that predicts the aqueous solubility of nitrogen- and oxygen-containing small organic molecules. For the applications presented, the predictive power of the PNN and GRNN models is found to be equivalent to previously examined methodologies such as k-NN classification and MLFN function approximation, but requiring significantly fewer input descriptors.; Predictive quantitative structure-property relationships (QSPRs) are presented that link topological molecular structure and derived amino acid parameters with the ion mobility spectrometry collision cross sections of a set of 113 singly-protonated, lysine-terminated peptides from a tryptic digest of common proteins. A trivial linear model using only the number of atoms as an independent variable is able to predict 88 of 113 peptide collision cross sections (78%) to within 2% of their experimentally determined value. (Abstract shortened by UMI.)...
Keywords/Search Tags:Molecular structure, Neural networks, QSAR, Generalized, Regression, Compounds, Used, PNN
Related items