Font Size: a A A

Variable Selection Methods And Their Applications In Quantitative Structure- Property Relationship (QSPR)

Posted on:2006-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L PengFull Text:PDF
GTID:1101360155463718Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Quantitative structure-activity/property relationships (QSAR/QSPR) has become an important branch of modern chemistry in past decades. A fundamental goal of QSAR/QSPR studies is to predict complex physical, chemical, biological, and technological properties of chemicals from simpler descriptors, preferably those calculated solely from molecular structure. Topological indices (TIs) are such numerical descriptors. They provide a convenient and inexpensive means of quantifying molecular structure, measuring molecular characters such as branching, shape and size. To solve the QSPR problems we met, this thesis includes the generalization and structural interpretation of topological indices and their applications of variable selection methods in QSPR.In the first part of this thesis, we decompose a large amount of famous topological indices into sets of topological character bases, different sets of character bases indicate different information of molecular structures, such as bond, atom, etc. Thus, each character bases will expand to a subspace in the whole topological information space. Using the topological character bases of connectivity index x, we tried to explain the great success of the connectivity index on many QSAR or QSPR researches in a new point of view - the impersonality of x's bond weighting formula. Then, it is suggested to recompose some topological indices by adjusting the weights upon character bases according to different properties/activities. This idea of recomposition is applied to the first Zagreb group index M1 and large improvement has been achieved. On the other hand, since the topological character bases are bases of the information space which provide original natural information for topological indices, they mayhave more direct structural or physical interpretation and produce significantly models than their mappings - original topological indices. Using the method of orthogonal block variables, the character base sets are blocked to extract the most useful information from different information spaces. The regression of only a few new orthogonal block variables shows large improvements both in fitting and prediction ability of the correlation model. At the same time, block variables are the linear projections of original information spaces which bring easily interpretation for the QSPR models. The second part of my thesis is about the variable selection methods, their applications in QSPR and some improvements of the methods. A new variable selection approach based on nonconcave penalized least squares is employed for interpretation and prediction of boiling points (BPs) of 530 alkanes. The good performances of the proposed method, compared with stepwise regression and the improved least absolute shrinkage and selection operator (LASSO), along with its simplicity and fast speed, makes it a valid competitor to the existing variable selection methods. All the 530 saturated hydrocarbons with carbon numbers from 2 to 10 and 128 common topological indices are taken into account. As a result, only 12 topological indices are selected from 95 pretreated ones but they still present a satisfying fitting and prediction effects. In the last part of the thesis, the second-order fused penalized least-squares is proposed for distinguishing Chinese Angelica from related umbelliferae herbs using high-performance liquid chromatographic fingerprints. This method considered both penalization on the coefficients and penalization on their differences. An iterative algorithm is deduced and it can be viewed as a general algorithm for all the second-order fused penalized least-squares.
Keywords/Search Tags:Variable selection, Topological character bases, Block variables, Penalized least-squares, variable fusion
PDF Full Text Request
Related items