Font Size: a A A

Quantitative structure property relationship modeling for protein chromatography separation systems

Posted on:2006-03-13Degree:Ph.DType:Thesis
University:Rensselaer Polytechnic InstituteCandidate:Zhuang, DechuanFull Text:PDF
GTID:2451390008472366Subject:Chemistry
Abstract/Summary:
The Quantitative Structure Activity Relationship research has grown rapidly during the past two decades. Applied in various fields, QSAR study can build predictive models and shed light on understanding the physical mechanism of the underlying phenomena. In both academic and industry environmental, more advanced modeling methods are needed to meet the requirement of the increasingly complex real systems.; One of the important issues in QSAR, feature selection, or descriptor selection, was extensively explored in this thesis. Three different subjective feature selection methods, Sparse SVM, GA/PLS and Sensitivity Analysis, were comparatively studied on various situations. Both synthetic and semi-synthetic datasets were generated to test the validness of the feature selection methods. SVM was found to give the best performance for data generated from a linear relationship while GA/PLS and Sensitivity Analysis gave out better results for data with a non-linear relationship. From there, a new method, based on l 1-norm support vector regression, was introduced for the feature selection under multiple responses. Considering more than one response simultaneously, the method is capable of selecting a set of descriptors that can be used for building predictive models for all responses. In addition, it may help to interpret the models by disclosing more information about the fundamental mechanisms behind the responses. Using the data collected on protein chromatography separation, a common set of descriptors was found to be consistent with the known mechanism. Predictive models were built with the selected descriptors only and they showed good predictability.; Virtual high throughput screening, a technique widely applied to drug discovery, was adapted to finding novel selective displacers for displacement chromatography. A binary SVM classification model was built from a small set of displacers that had been tested previously. The model was then employed to predict on a large database of molecules. 7 out of 15155 molecules were predicted selective. Put into further experimental test, 3 have shown significant selectivity so far. Several independent QSAR models were built and most of them agree with binary SVM classification model. Therefore, we concluded that virtual high throughput screening could be successfully applied to chromatography systems.
Keywords/Search Tags:Chromatography, Relationship, Model, SVM, Applied, QSAR, Feature selection
Related items