Font Size: a A A

Methods to improve the reliability, validity and interpretability of QSAR models

Posted on:2006-08-15Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Guha, RajarshiFull Text:PDF
GTID:2451390005998246Subject:Chemistry
Abstract/Summary:
Quantitative structure activity relationship (QSAR) models are a statistical solution to the problem of directly calculating physical and biological properties of molecules from their physical structure. The direct prediction of properties is in general not feasible either owing to lack of computing resources or lack of knowledge about the relationship between structure and property. The goal of a QSAR model is to extract information from a set of numerical descriptors characterizing molecular structure and use this information to develop inductively a relationship between structure and property. Two important questions arise during the modeling process. First, are the data used to build the model representative of the whole dataset and can the model be extended to predict properties for new molecules? Second, given that a model encodes information about the structures of molecules and relates this to their properties, can we extract and interpret the encoded information? The focus of the work reported in this thesis is on the validation and interpretation of QSAR models and presents both applications of interpretation techniques as well as the development of validation and interpretation methodologies.; The thesis focuses on three main topics. First, a method to create QSAR sets, which are used in the modeling process, is described. The motivation for this work is to improve the reliability of the resultant QSAR models and provide an alternative to statistical design based methods. Second, a method is described by which the validity of a QSAR model can be ascertained when faced with observations that it has not been trained on or validated with. The method is both simple in nature and generalizable to arbitrary regression models. Finally, the thesis focuses on aspects of the interpretation of QSAR models, both linear and nonlinear. Applications of the PLS interpretation method to linear regression models of biological properties are described. In addition the development of methods to provide broad and detailed interpretation of neural network QSAR models are described.
Keywords/Search Tags:QSAR models, Improve the reliability, Method, Interpretation, Structure, Described
Related items