Font Size: a A A

Application Of QSPR/QSAR Studies In Medicinal Chemistry, Analytical Chemistry And Environmental Science

Posted on:2008-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y RenFull Text:PDF
GTID:1101360215457972Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Quantitative structure-property/activity relationship (QSPR/QSAR) study, as a branch of quantitative drug design research, was first applied in the biological field and developed in response to the rational design of bioactivity molecules. Due to the rapid development and the extensive application of computer science, the studies of QSPR/QSAR met a new age and they have been widely used in several fields including biology, medicinal science, chemical and environmental science, etc. Using different statistical methods, we expect to develop a successful theoretical model which can not only develop a method for the prediction of the property of compounds that have not been synthesized but also can identify and describe important structural features of molecules that are relevant to variations in molecular properties, thus gain some insight into structural factors affecting molecular properties and correspondingly provide information for the functional design of the molecules.The relationship between the molecular structure and the studied property/activity may be linear or nonlinear. For linear system, simple regression analysis is enough. While for the nonlinear system, the simple nonlinear problem can be mathematically transformed to the linear one, otherwise we can correlate this kind of relationship by choosing appropriate nonlinear function; if the problem is more complex, i.e. the cause and effect relationship is not obvious or the inference rule is not unsure and the above two methods failed to work, the most appropriate way to dealing with this problem is try to use various machine learning methods, such as artifical neural network (ANN) , support vector machine (SVM) , and projection pursuit regression (PPR) to approximate the regression function.In this dissertation, a brief description of the QSPR/QSAR principle and research status was given in Chapter 1. Much emphasis was put on the realization process of QSPR/QSAR. Three kinds of statistical learning methods were used to study the relationship between the structure and the property/activity for various systems.In Chapter 2, we concluded the application of linear regression methods in the QSPR/QSAR studies. A brief description was given below: (1) The application ofheuristic method (HM) to develop quantitative structure-property model to predict thenematic transition temperatures for 42 homogeneous thermotropic liquid crystals. Theobtained optimal linear model contained five descriptors which can reflect thestructural features affecting the nematic-isotropic transition. For the test set, thismodel gave a high predictive correlation coefficient (R) as 0.96, lower root meansquared error (RMSE) as 6.3654 and an average value of absolute relative deviation(AARD) as 9.2017%, which are all better than the multiple linear regression (MLR)result in reference. In addition, the heuristic method is a simple, practical andeffective method and it can also be extended in other QSPR investigation. (2)Application of the best multi-linear regression (BMLR) method to systematicallysearch the multi-linear regression equations to develop quantitativestructure-retention models for prediction of the CC×GC-TOFMS retention time of209 PCB congeners. The obtained best regression model involved four descriptorswhich were related to GC×GC-TOFMS chromatographic retention of PCBs. Theobtained model was validated by two approaches, i.e. by test set and by a 6-foldcross-validation procedure. Both procedures showed good predictive ability. For thetest set, the model gave a predictive correlation coefficient (R) of 0.988 and AARD of3.08%. The average value of the AARD by 6-fold cross-validation procedure was3.21%, which were in accordance with those of the test set indicating that the obtainedmodel was statistically stable and was reliable. (3) The application of heuristic methodto build multiple linear regression models for the prediction of gas chromatographicKovats retention indices of 150 acyclic C5-C8 alkenes on two stationary phases(polydimethylsiloxane, PDMS, and squalane, SQ). This work deals with the firstapplication of 3D topographic connectivity indices to quantitative structure-retentionrelationship studies. The 3D topographic connectivity indices can describe the 3Dspatial electronic structures of molecules more accurately. These descriptors,combined with five other kinds of descriptors calculated by CODESSA software,were used to correlate with the GC retention indices. The resulting quantitativestructure-retention relationships (QSRRs) models were with predictive R2 values of 0.970 and 0.958 and AARD as 1.37%和1.52% for retention indices on PDMS and SQ columns, respectively.1Ωp, a three-dimensional (3D) topographic index, was found to play the most important role in the description of the chromatographic retention behavior of the alkenes in these two stationary phases. Moreover, this index could completely distinguish different isomers of alkene. Therefore, it can also be extended to distinguish different isomers of other compounds so that can well describe their quantitative structure-retention relationships.In Chapter 3, we concluded the application of support vector machine (SVM) in the QSPR/QSAR studies. A brief description was given below: (1) Application of SVM in the development of nonlinear binary classification model of skin sensitization for a diverse set of 131 organic compounds. Six descriptors selected by stepwise forward linear discriminant analysis (LDA) were used as inputs of the SVM model. The nonlinear model developed from SVM can correctly classify 89.77% and 72.09% compounds in the training and test set, respectively, which were higher than those of 79.55% and 67.44% by LDA model, indicating that SVM model was more accurate in the recognition of skin sensitizers. A 10-fold cross-validation procedure was also performed accounting for the difference among the data points and its results were very similar to those by nonlinear SVM model, indicating that the nonlinear SVM model was statistically stable. (2) Application of SVM to develop an accurate QSPR model for the prediction of cloud point of 62 polyoxyethylene type nonionic surfactants and study the cloud phenomena of nonionic surfactant in aqueous solution. A total of 88 descriptors were calculated and were used to perform the regression analysis with the cloud point. Four descriptors were selected by HM as the inputs of MLR and SVM models. Very satisfactory results were obtained. SVM model performs better both in the fitness and in the prediction capacity indicating its good generalization capability. For the test set, it gave a predictive squared correlation coefficient (R2) of 0.9765, RMSE of 4.2727 and AARD of 9.5490, respectively, which were better than the corresponding values of 0.9318, 8.0824 and 16.1955% by MLR model. (3) Application of SVM to develop a QSPR model based on the theoretical molecular descriptors that take into account the different features of well-correlated, with predictive R2 values of 0.970 and 0.958 and AARD as 1.37%和1.52% for retention indices on PDMS and SQ columns, respectively. 1QP, a three-dimensional (3D) topographic index, was found to play the most important role in the description of the chromatographic retention behavior of the alkenes in these two stationary phases. Moreover, this index could completely distinguish different isomers of alkene. Therefore, it can also be extended to distinguish different isomers of other compounds so that can well describe their quantitative structure-retention relationships.In Chapter 3, we concluded the application of support vector machine (SVM) inthe QSPR/QSAR studies. A brief description was given below: (1) Application ofSVM in the development of nonlinear binary classification model of skin sensitizationfor a diverse set of 131 organic compounds. Six descriptors selected by stepwiseforward linear discriminant analysis (LDA) were used as inputs of the SVM model.The nonlinear model developed from SVM can correctly classify 89.77% and 72.09%compounds in the training and test set, respectively, which were higher than those of79.55% and 67.44% by LDA model, indicating that SVM model was more accurate inthe recognition of skin sensitizers. A 10-fold cross-validation procedure was alsoperformed accounting for the difference among the data points and its results werevery similar to those by nonlinear SVM model, indicating that the nonlinear SVMmodel was statistically stable. (2) Application of SVM to develop an accurate QSPRmodel for the prediction of cloud point of 62 polyoxyethylene type nonionicsurfactants and study the cloud phenomena of nonionic surfactant in aqueous solution.A total of 88 descriptors were calculated and were used to perform the regressionanalysis with the cloud point. Four descriptors were selected by HM as the inputs ofMLR and SVM models. Very satisfactory results were obtained. SVM modelperforms better both in the fitness and in the prediction capacity indicating its goodgeneralization capability. For the test set, it gave a predictive squared correlationcoefficient (R2) of 0.9765, RMSE of 4.2727 and AARD of 9.5490, respectively,which were better than the corresponding values of 0.9318, 8.0824 and 16.1955% byMLR model. (3) Application of SVM to develop a QSPR model based on thetheoretical molecular descriptors that take into account the different features ofchemical structures related to hydrogen-bond acidity for 137 compounds. Fivedescriptors were selected out by HM and were used as the input to construct nonlinearradial basis function neural network (RBFNN) and SVM models, respectively. SVM performs better than RBFNN. For the test set, it gave a predictive R2 of 9204, RMSE of 0.0588 and AARD of 15.16%, respectively, better than corresponding respective values of 0.8655, 0.0772 and 24.46% given by model. The prediction results are in good agreement with the experimental values. (4) Application of SVM in the construction of a QSPR model to correlate the molecular structural features with the rate constant for 112 acyclic carbons and aromatic carbons during the degradation by NO3 radical in the troposphere. Four descriptors were selected by HM and were used as the inputs to construct RBFNN and SVM models, respectively. SVM performs better than RBFNN and HM. For the test set, SVM gave a predictive squared correlation coefficient R2 of 0.950, RMSE of 0.577 and AARD of 3.343%, respectively. The prediction results are in good agreement with the experimental values.In Chapter 4,we concluded the application of projection pursuit regression (PPR) methods in the QSPR/QSAR studies. A brief description was given below: (1) PPR was applied to develop correlation model between the structural features of 116 organics and the corresponding rata constants of these compounds in the reaction with ozone molecule in the troposphere. Seven descriptors selected by HM were used as inputs to perform MLR, SVM and PPR study. Comparatively, the PPR model performs best both in the fitness and in the prediction capacity. For the test set, it gave a predictive R2 of 0.955, RMSE of 1.041 and AARD (%) of 4.663, respectively, which were better than corresponding values of MLR model (R2=0.824, RMSE=1.342, AARD=5.895%) and SVM model (R2=0.875, RMSE=1.165, AARD=4.896%). The results proved that PPR is a useful tool that can be used to solve the nonlinear problems in QSPR. (2) Quantitative structure-activity relationship (QSAR) has been applied to a set of thyroid hormone receptorβ1 (TRβ1) antagonists, which are of special interest because of their potential role in safe therapies for nonthyroid disorders while avoiding the cardiac side effects. Six molecular descriptors selected by genetic algorithm (GA) were used as inputs to perform a MLR analysis and a PPR study to develop a more accurate model to correlate the structure features with the binding activity. Very good results were obtained. The PPR model performs better than MLR model both in the fitting and prediction capacity. For the test set, it gave a predictive correlation coefficient (R) of 0.9450, RMSE of 0.4498 and AARD of 4.19%, respectively, while the MLR model can only give R as 0.8505, RMSE as 0.7172 and AARD as 8.28%, confirming the ability of PPR for the prediction of the binding affinities of compounds toβ1 isoform of human thyroid hormone receptor (TRβ1). (3) PPR was utilized to perform a quantitative structure-property relationship study to model the melting points for a diverse set of 288 potential ionic liquids (ILs) including pyridinium Bromides, imidazolium Bromides, benzimidazolium Bromides and 1-substituted 4-amino-l,2,4-triazolium Bromides. HM was use to select the best set of molecular descriptors, then the selected descriptors were used as inputs to construct both MLR model and nonlinear PPR model. Satisfactory results were obtained for this complex system. The PPR model gave a high R2 of 0.810 and small AARD of 17.75%, which are better than those by HM model (R2=0.712, AARD=24.33%) indicating that PPR is better for the prediction of the melting points. In addition, the descriptors selected by HM can give some insight into factors that can affect the melting points, i.e. benzene ring structure, rotatable bonds, branching, symmetry and intramolecular electronic effects. This information would be very useful in the design of the potential ILs with desired melting points.
Keywords/Search Tags:Chemoinformatics, QSPR/QSAR, Generic Algorithm, Artificial Neural Network, Support Vector Machine, Projection Pursuit Regression
PDF Full Text Request
Related items