Font Size: a A A

Comparison Of The Qsar In The Variable Selection And Its Application

Posted on:2006-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:J W ShenFull Text:PDF
GTID:2191360182468353Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
QSAR and QSPR is a main branch of chemometrics research. Now, QSAR is a predictive technique based on the relationship, for a series of chemicals, between some form of biological activity and some measure of physic-chemical or structural properties, which has been widely used in theory and computer chemistry, environment chemistry, medical chemistry, even in life sciences.This paper including the evaluation of structural descriptors, variables selection and put these methods into practice.The first part is focused on the research of the correlation between variables (see second chapter). A new method named subspace comparison method has been brought forward to investigate relationship between block variables from high-dimension aspect. This method can be used not only to measure the correlation of variables, but also to be used as a criterion for variables selection. Five kinds of popular topological block variables of 530 saturated hydrocarbons are calculated. The relationship between them, as well as the model between boiling point and three less relation variables has been studied. Standard bias error is 4.08 and regression coefficient reaches 0.9948. The RMSECV value of leave-one-out cross validation is 4.38. The result is better then literature both in regression and prediction.The second part is engaged on variable selection, which is described on the third chapter. We try to find a robust way for variable selection to overcome the shortcoming of local optimum, since traditional methods have such disadvantages. In this paper, we advanced a sequence method combining with forward selection and combinations to select the optimized variables. It has been proved as a promising approach for variable selection by put it into use in real system.The pivot of the last part is solving practical viscosity problem since viscosity is an important mimic parameter in chemical engineering and petroleum chemistry. An idea viscosity model about a set of 532 compounds with diversity structures has not been built, because of itsdiversity. Till now, the regression coefficient of the best model has been built only 0.92. Four kinds of block variables are calculated, whose relationships are also be worked over by subspace comparison method. An improved subspace orthogonal method is established to orthogonal the block variables. 300 compounds have been selected as calibration set by uniform design, the rest for prediction. An excellent viscosity model is built with well mathematic statistical characteristic. R reaches 0.95, s value is 0.45 and the predict error is 0.49 almost near s. It proves that this model is the best models of those have been built till now and overmatch any others from literature.
Keywords/Search Tags:Topological index, Block variable, subspace comparison method, Subspace orthogonal method, Canonical correlation
PDF Full Text Request
Related items