Font Size: a A A

Application Of Novel Cheminformatics Algorithms Studies In Chemistry, Biology And Food Science

Posted on:2010-10-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y DuFull Text:PDF
GTID:1101360275990282Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
In recent years,with the development of information science, computerscience and convenient internet, a new interdisciplinary subject—Chemoinformaticsalso developed rapidly. Chemoinformatics is a knowledge utilizing variousinformatics methods to solve the chemical problems, find the essence of chemicalphenomena, and explain the discipline which was hidden in a large-scale data set。Theresearch area of chemoinformatics is very wide and the investigated contents areabundant, such as chemical experiment design and optimization, analytical signaltreatment, chemical pattern recognition, model and parameter estimate, artificialintelligence, etc. Chemoinformatics produced in the continuous process of thechemical knowledge of the necessary laws satisfying the scientists' needs.Quantitative structure-property/activity relationship (QSPR/QSAR) study is animportant applied branch of chemoinformatics algorithms. It refers to that there existsa quantitative relationship between the structural parameters of compounds and theirbiological activity。The QSPR/QSAR study was first applied in the biological fieldand developed in response to the rational design of bioactivity molecules. Due to therapid development and the extensive application of computer science, the studies ofQSPR/QSAR come into a new era and they have been widely used in several fieldsincluding biology, medicinal science, chemical and food science, etc. Using differentstatistical methods, we expect to develop a successful theoretical model which can notonly develop a method for the prediction of the property of compounds that have notbeen synthesized but also can identify and describe features of molecules that arerelevant to variations in molecular important properties, gain some insight intostructural factors affecting molecular properties and correspondingly provideinformation for the functional design of the molecules.The development of chemoinformatics provides a novel, practical andconstructive approach for the chemical branches' progress. In this dissertation, we mainly discussed some novel machine learning algorithms in chemoinformatics, andalso applied these methods to QSAR/QSPR research fields. It consists of five chapters;the detailed description of the chapters was shown in the following:In chapter 1, I described the principle of chemoinformatics and the current researchstatus, at the same time, some novel algorithms were also introduced in this chapter.Furthermore, a brief description of one important applied branch of chemoinformatics-QSAR was provided, including its evolution history, basic theory, and implementsteps.Chapter 2 mainly discusses the application of Quantitative structure-retentionrelationship(QSRR)method in the prediction of chromatography retention behaviorsof peptides. A brief description was given in the following: (1) QSRR modelscorrelating the retention times of peptides in reversed-phase liquid chromatography(RPLC) and their structures were developed based on linear and nonlinear modelingmethods. The Best multi-linear regression (BMLR) method was used to select themost appropriate molecular descriptors and develop a linear QSRR model. Anothertwo nonlinear regression methods (Radial basis function neural networks (RBFNN)and Projection pursuit regression (PPR)) were used in the nonlinear QSRR modelsdevelopment. The coefficients of determination (R2) for the training set of these twomethods (RBFNN and PPR) were 0.9787 and 0.9881; the root mean square of errors(RMSE) of these two methods were 0.5666 and 0.4207, respectively. The proposedmethods RBFNN and PPR will be of importance in the proteomic research, and couldbe expected to apply to other similar research fields. (2) Novel method Local lazyregression (LLR) was first used to predict the retention behaviors of peptides in theNickel column in immobilized metal-affinity chromatography (IMAC). The BMLR,PPR and LLR approaches were used to build linear and non-linear QSRR models. TheR2 of the best model LLR model were 0.9446 and 0.9252 for the training and test sets,respectively. By comparison, it was proved that the novel local learning method LLRwas a very promising tool for QSRR study. It could be applied to other chromatography research fields and that should facilitate the design and purificationof peptides and proteins.Chapter 3 described the application of QSAR method in agriculture and food sciencescopes. A brief description was given as below: (1) Three machine learning methodsGenetic algorithm-Multi-linear regression (GA-MLR), Least-squares support vectormachine (LS-SVM) and PPR were used to investigate the relationship betweenthiazoline derivatives and their fungicidal activities against the rice blast disease. Boththe linear and nonlinear modes gave good prediction results, but the non-linear modelsafforded better prediction ability, which meant the LS-SVM and PPR methods couldsimulate the relationship between the structural descriptors and fungicidal activitiesmore accurately. The results show that the non-linear methods (LS-SVM and PPR)could be used as good modeling tools for the study of rice blast. Moreover, this studyprovides a new and simple but efficient approach, which should facilitate the designand development of new compounds to resist rice blast disease. (2) QSRR studieswere performed for predicting the retention times of 43 constituents of saffron aroma,which were analyzed by solid-phase micro-extraction gas chromatography massspectrometry (SPME-GC-MS). The linear and non-linear QSRR models wereconstructed using BMLR and PPR methods. The predicted results of these twoapproaches were both in agreement with the experimental data. The R2 of the bestmodel (PPR) were 0.9806 (training set) and 0.9456 (test set) respectively. This studyalso affords a simple but efficient approach for studying the retention behaviors ofother similar plants and herbs.Chapter 4 described the application of QSAR method in life science and medicineresearch. It contains the following parts: (1) The relationship between the logarithm ofretention indices (log kIAM) of 55 diverse drugs in immobilized artificial membrane(IAM) chromatography and molecular structural descriptors was established by linearand non-linear modeling methods-PPR and LLR. In this study, the BMLR methodwas used to select the most important molecular descriptors and develop a linearQSRR model. Using the selected descriptors, the other two non-linear regression methods, PPR and LLR were also utilized to build more accurate models. Bycomparing these different methods, the LLR model gave the best predictive resultswith R2 of 0.9540, 0.9305; RMSE of 0.2418, 0.3949; for the training and test sets,respectively. The results were also shown that the LLR method was a promisingmethod for QSRR modeling, and could be used in other similar chromatographyresearch fields. (2) QSAR models of three matrix metalloproteinases (MMP-1,MMP-9, MMP-13) inhibition were developed based on linear and non-linearmodeling approaches by a series of N-hydroxy-a-phenylsulfonylacetamide derivatives(HPSAs). The BMLR method was used to develop the linear QSAR model. Globalgrid search PPR method was firstly used in generating the non-linear QSAR model ofMMP inhibitory phenomena. Both the linear and non-linear models could povidepromising prediction results. Six models were built according to different MMPs anddifferent MMPs inhibitory activities (log (106/IC50)). It was proved that thecombination of PPR and Global Grid Search method was a very useful modelingapproach for the prediction of MMP inhibitory activities, and the global grid searchmethod can also be used in other parameter optimization work. (3) The linearregression and non-linear regression methods-Grid search-support vector machine(GS-SVM) and PPR were used to develop QSAR models for a series of derivatives ofnaphthalene, benzofurane and indole with respect their affinities to MT3/QuinoneReductase 2 (QR2) melatonin binding site. Five molecular descriptors selected bygenetic algorithm (GA) were used as the input variables for the linear regressionmodel and two non-linear regression approaches. By comparing the results of thethree methods, it indicated that the PPR method was the most accurate approach inpredicting the affinities of the MT3/QR2 melatonin binding site. Moreover it shouldfacilitate the design and development of new selective MT3/QR2 ligands.Chapter 5 described the application of QSAR method in chemosensory systemsresearch. In this chapter, QSAR models were successfully developed for predictingthe relative sensitivities-odor detection thresholds (ODTs) and nasal pungencythresholds (NPTs) for the olfaction and nasal trigeminal chemosensory systems of a set of volatile organic compounds (VOCs). The BMLR, SVM and LLR were used tobuild regression models. By comparing the results of these three methods, the LLRmodel gave better results. Furthermore, this investigation also identified someimportant structural information which was strongly correlated the relativesensitivities of these VOCs. Such information can be used to select and manufacturechemical sensors in the future. The LLR method is a promising approach for QSARmodeling, and it also could be used to model the other similar chemical sensors.
Keywords/Search Tags:Chemoinformatics, QSPR/QSAR, Projection pursuit regeression, Local lazy regression, Support vector machine, Least squares support vector machine, Grid search, Generic algorithm
PDF Full Text Request
Related items