Font Size: a A A

Some Novel Calculated Models Of Molecular Descriptors And Their Application In Quantitative Structure-Property Relationships

Posted on:2017-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L YuFull Text:PDF
GTID:1221330488977088Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
The research work in this thesis focuses on new calculated models of molecular descriptors for quantitative structure-property/activity relationship (QSPR/QSAR) studies and computer-aided apatmer selection and design in SELEX (systematic evolution of ligands by exponential enrichment) experiment. Generally, the molecular descriptors, especially the quantum chemical descriptors, used for QSPRs are calculated from molecules in vacuum and ground states, where their molecular structures can not be affected by other molecules. Further, molecular descriptors are, generally, obtained from the whole molecules. To obtain molecular descriptors and carry out QSPR studies, we adopt new calculated models such as transition state structures, integral equation formalism polarizable continuum model (IEF-PCM), and functional roops of apatmers. The main content of the work is divided into six sections as follows.The first chapter reviews the QSPR studies including such aspects as data set selection, calculation models of molecular descriptors, and the statistical principles for developing QSPR models.The second chapter focuses on the prediction of the Q-e parameters from transition state structures with QSPR models. The Q-e scheme has been demonstrated remarkably useful in interpreting and predicting the reactivity of a monomer in free-radical copolymerizations. In the present work, two support vector regression (SVR) models were, respectively, developed to predict parameters Q and e in the Q-e scheme. Quantum chemical descriptors used to build SVR models, for the first time, were calculated from transition state species with structures C1H3-C2HR3· or ·C1H2-C2H2R3, formed from vinyl monomer C1H2=C2HR3+H·. The optimal v-SVR model of 1nQ (C= 130, v=0.2 and γ=1.0) based on 70 monomers has the root mean square (rms) error of 0.336 and correlation coefficient (R) of 0.982. The optimal ε-SVR model of e (C=1.2, γ=3 and ε=10-2) produces rms=0.259 and R=0.963. Compared with previous models, the SVM models in this thesis have better predictive performance. Results of the study suggest that calculating quantum chemical descriptors from the transition state structures to predict parameters Q and e in the Q-e scheme is feasible. This investigation encourages the further application of transition state descriptors to other QSPR/QSAR studies.The third chapter concerns in Setschenow constant prediction based on the integral equation formalism polarizable continuum model (IEF-PCM) calculations and QSPR model. The Setschenow constant Ksalt of a compound in NaCl solution is an important parameter. The dissolution environment influences the geometrical structures, energies, charge distributions and other properties of solutes. The IEF-PCM theory for solvent effects was used to optimize molecular geometrical structures, together with the density functional theory (DFT) method combined with Becke’s three-parameter hybrid functional and Lee-Yang-Parr’s gradient-corrected correlation functional (B3LYP) at 6-31G(d) level. Single-point energy calculations and natural bond orbital analyses were carried out with the same method. After generating 1672 molecular descriptors, four descriptors were selected to develop models for Ksalt of 101 organic compounds, by using the genetic algorithm (GA) method together with multiple linear regression (MLR) technique. The optimal MLR and support vector machine (SVM) models of Ksalt have the mean root mean square (rms) errors of 0.0287 and 0.0227, respectively. Compared with previous models, the two models in this thesis have better statistical performances. Results of the study suggest that calculating molecular descriptors from IEF-PCM to predict the Setschenow constants Ksalt of organic compounds in NaCl solution is feasible.The fourth chapter concerns in the prediction of glass transition temperatures (Tg) of polymethacrylates from the chain segment structures. Tg is the most important parameter of an amorphous polymer material. A QSPR model of glass transition temperatures of 56 polymethacrylates was obtained by MLR. Three molecular descriptors were calculated from the chain segments of polymer backbones comprising 10 repeating uints to develop the model. The training set (comprising 36 polymethacrylates) of the model has a correlation coefficient (R) of 0.971 and standard error of estimation of 15.731K. The external test set of 20 polymethacrylates possesses a correlation coefficient of 0.946 and a root mean square (rms) error of 17.286 K. The mean relative error for the whole data set (56 polymethacrylates) is 4.065%. The results indicate the ability of the present model to estimate the glass transition temperatures for polymethacrylates. The investigation demonstrates the powerful ability of the chain segments as representative structures of polymers, which could be further applied in QSPR studies of other properties of polymers.The fifth chapter concerns in recognition of candidate aptamer sequences for human hepatocellular carcinoma in SELEX screening with structure-activity relationships. Selecting and synthesizing aptamers for human hepatic carcinoma cells with high affinity and specificity would be of critical importance for the early diagnosis of liver cancer. This thesis is the first report on pattern recognition used for SELEX-based aptamer screening by applying support vector classification (SVC) technique for a two-class problem. The candidate aptamer sequences that show different degrees of affinity and specificity for SMMC-7721 liver carcinoma cells were selected through whole cell-SELEX. After calculating 1670 molecular descriptors,13 descriptors were selected, which were compressed to 6 latent variables used as the inputs for classification models. The predicted fractions of winner aptamers from the SELEX selection of the 3rd,5th,7th,9th,11th, and 13th rounds are 0.033,0.427, 0.678,0.828,0.912 and 0.983, respectively, which conform to the aptamer evolutionary principle of SELEX based screening. By the pattern recognition analysis based on a structure-activity relationship model,6 DNA candidate aptamer sequences belonging to the class of sequences with high affinity and specificity have experimental dissociation constants Kd in the nanomolar range. The feasibility of applying pattern recognition for the design and selection of aptamers has been demonstrated.The sixth chapter focuses on the pattern recognition of enrichment levels of SELEX-based candidate aptamers for human C-reactive protein (CRP), and hierarchical cluster analysis of CRP candidate aptamers.Selecting and synthesizing aptamers for human C-reactive protein would be of critical importance in predicting the risk for cardiovascular disease. The enrichment level of DNA aptamers is an important parameter used for selecting candidate aptamers for further affinity and specificity determination. This thesis is the first report on pattern recognition used for CRP aptamer enrichment levels in the SELEX process, by applying structure-activity relationship models. After 10 rounds of grapheme oxide (GO)-SELEX selection and 1670 molecular descriptor generation,8 molecular descriptors were selected and then five latent variables were obtained with principal component analysis (PCA), to develop the support vector classification (SVC) model. The SVC model (C=8.1728 and y=0.2333) optimized by particle swarm optimization algorithm possesses an accuracy for the training set is 88.15%. Prediction results of enrichment levels for the sequences with the frequencies of 6 and 5 are reasonable and acceptable, with accuracies of 70.59% and 76.37%, respectively.For the SELEX experiment, a practically important question is how to select candidate aptamers from SELEX products for further affinity determination. In this thesis, hierarchical cluster analysis was adopted to create clusters for candidate DNA aptamers that bind CRP target. During cluster analysis, two molecular descriptors were used as characteristic variables, the median clustering method was used to calculate the distance between two clusters, and the squared Euclidean distance is adopted to measure the degree of similarity between two cases (or clusters). Totally,4609 candidate aptamer sequences for CRP target were divided into two clusters. The clustering results for nine DNA sequences whose binding affinities have been characterized were tested to be in accordance with the experimental studies. The feasibility of applying hierarchical cluster analysis to select candidate aptamers from SELEX products for further affinity determination has been demonstrated.By using the classification resuts of candidate aptamers for CRP,10 sequences are selected with potentially both high enrichment levels and high binding affinities against CRP.
Keywords/Search Tags:Q-e scheme, Setschenow constant, Chain segment, Glass transition temperature, Aptamet, Human hepatocellular carcinoma, C-reactive protein, IEF-PCM, Structure-property relationship, Transition State, Particle swarm optimization, Support vector machines
PDF Full Text Request
Related items