Font Size: a A A

Quantitative Structure-Activity/Property Relationship Studies In Biomolecules Based On Partial Least Squares And Support Vector Machine

Posted on:2012-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:P X LiFull Text:PDF
GTID:2131330341950382Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Quantitative Structure-Property/Activity Relationship (QSPR/QSAR) study is animportantly applied branch of chemoinformatics algorithms. The method combinedthe computational methods with various statical analysis tools is applied to build thequantitative relationship between the structures of compounds and their properties. Itcan both build some theory methods to predict the physical-chemical properties andfind the structural features responsible for various properties. From the built models,the influence of compounds structure to the properties from the molecular level can beunderstand. At present, the method has been introduced to the fields of drug design,analytical chemistry, environment science, food science and material science.The relationship between the molecular structure and the studiedproperty/activity may be linear or nonlinear. For linear system, Partial least squaresregression (PLS) is a main modeling method in QSPR/QSAR studies, for it cananalyze data with strong multicollineariyt. Generally, Support vector machine (SVM)was used to solve nonlinear regression problems for the nonlinear system. In thisdissertation, we mainly discussed in detail How the PLS and SVM were used to buildlinear and nonlinear models in the QSPR/QSAR studies, respectively. It consists offour chapters: the detailed description of the chapters was shown in the following:1. The chapter one included a brief description on the current status, the basicprinciple, and application prospect of QSPR/QSAR. Much emphasis was put on thestage of research of QSPR/QSAR. At the same time, we also present the basicprinciple of the two methods: support vector machine method (SVM) and the partialleast squares regression (PLS).2. The QSAR in the prediction of porphyrins as telomerase inhibitors. Using theactivity factor D which was calculated through the percentage of inhibition ofporphyrin derivatives for telomerase as the research object, the QSAR model wasestablished to predict the 32 porphyrin derivatives activity factor D by using PLS andSVM methods. Stepwise regression and principal components analysis (PCA) wereemployed to select the parameters, the selected different parameters were used toestablished prediction model. In contrast to the PLS model, the results obtained by the SVM model are much better. The SVM model, with high statistical significance(R2=0.9170, RMSE=0.1663), could be used to predict the biological activity ofporphyrin derivatives. Furthermore, we can obtain that the main parameterinfluencing the activity factor D is the electrostatic descriptors by analyzing theoptimal models.3. The study of QSPR about absorption maxima (λmax) for 34 porphyrincompounds. Six descriptors were selected out by stepwise Regression and used as theinput to construct linear PLS and nonlinear SVM models, respectively. SVM performsbetter than PLS. For the whole set, SVM gave a predictive squared correlationcoefficient (R2) of 0.9293, better than corresponding respective values of 0.8932which was given by PLS model. It was concluded from the two models that the mainfactors influencing the absorption maxima (λmax) for porphyrin compounds are theelectronic effects, molecular composition, steric effect and polarity.4. The QSAR model between 60 metal complexs and its structural feature wasestablished by the PLS and SVM. Principal components analysis (PCA) wasemployed to reduce dimensionality and selected the effective variable subset forpredicting their binding constant K. 20 descriptors that can influence the interaction ofselected metal compounds and DNA have been investigated, 3 descriptors that thetotal contribution is 90.79% were selected out and used as the input to constructmodels. We compared the predictability of the PLS and SVM models, the resultsshowed that the novel SVM was a promising tool for QSAR study. For the whole set,SVM gave a predictive squared correlation coefficient (R2) of 0.8926 and AARD of3.24%.
Keywords/Search Tags:Partial Least Squares, Support Vector Machine, QSPR/QSAR, Stepwise Regression, Principal Components Analysis
PDF Full Text Request
Related items