Disease Diagnosis And QSAR/QSPR Studies Based On Gene Expression Programming And Support Vector Machine

Posted on:2007-11-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Z Si

Full Text:PDF

GTID:1104360182994213

Subject:Analytical Chemistry

Abstract/Summary:

PDF Full Text Request

Artificial intelligence is very important for social development, such as solve complex problem, reduce time and promote produce. With the progress of technology, a great deal of data be produced and need to be processed by effective method. Therefore, finding good method has attracted lots people to study it.In this dissertation, a novel method of gene expression programming (GEP) was proposed. GEP had invented by Ferreira C. in 1999 based on genetic algorithm (GA) and genetic programming (GP). As a new machine learning method, it has good generalization performance. Hence, in this paper GEP was used to study disease diagnosis and QSAR/QSPR.In the first chapter, the principle of gene expression programming and support vector machine were described in detail. At last, we gave a review of the application of GEP and SVM in disease, biology and chemistry areas.In chapter 2, we applied GEP and SVM to diagnose disease and predict prevalence of disease. A brief description was given as follows: (1) Linear discriminant analysis (LDA) and support vector machines (SVM) with a radial basis function (RBF) kernel as a non-linear technique were used to diagnose coronary heart disease based on serum lipids and serum glucose. The prediction accuracy of training and test sets of SVM are 96.86% and 78.18% respectively, while the prediction accuracy of training and test sets of LDA are 90.57% and 72.73% respectively. The cross-validated the prediction accuracy of SVM and LDA are 92.67% and 85.4%. Support vector machine can be used as a valid way for assistant diagnosis of coronary heart disease. (2) Prediction of the diagnostic category of serum sample from its immune indices for class prediction has important applications in IgA nephropathy. We have used a novel algorithm of support vector machine and linear discriminant analysis to diagnose pyelonephritis and nephrotic syndrome. We found that the predict accuracy of support vector machine and linear discriminate are 80% and 92.85% respectively. The result shows that support vector machine methodology was able to produce models with the better predictive performance than linear discriminant.It can be used as a valid way for assistant diagnosis nephropathy in nature. (3) We used GEP to fit the prevalence of SARS of Beijing and Shanxi in 2003. The fitness results are very satisfaction. Both precision and speed of GEP are better than those of ANN.In Chapter 3, we applied GEP and SVM to predict the properties of drugs, includes: (1) The binding rate to human plasma protein for 70 diverse drugs was modeled using the descriptors calculated from the molecular structure alone with a quantitative structure-activity relationship (QSAR) technique. The heuristic method (HM) and support vector machine (SVM) were utilized to construct the linear and nonlinear prediction models, leading to a good cross-validation correlation coefficient R2CV of 0.80 and 0.82 respectively. The specific information described by the heuristic linear model could provide some insights into the factors that are likely to govern the binding rate of the compounds and be used as an aid to the drug design process, however, the prediction results of the nonlinear SVM model seemed to be better than those of HM. (2) Gene expression programming (GEP), a novel machine learning algorithm, was used to develop a quantitative model to design and screen a series of anti-HIV compounds for the first time. Each compound was represented by several calculated structural descriptors, which include constitutional, topological, geometrical, electrostatic, quantum-chemical features of this compound. The descriptor was searched and selected by heuristic method. This approach produces a nonlinear, five-descriptor quantitative model based on GEP with mean errors 0.41 and a predicted correlation coefficient (R) 0.91. The predicted results both training set and testing set of GEP are better than those of SVM. The present paper provides a novel and effective method for drug design and screening. (3) The gene expression programming was used to develop quantitative model as a potential screening mechanism for a series of 1,4-dihydropyridine calcium channel antagonists for the first time. The heuristic method was used to search the descriptor space and select the descriptors responsible for activity. A nonlinear, six-descriptor model based on gene expression programming with mean-square errors 0.19 was set up with a predicted correlation coefficient (R2) 0.92. This paper provides a new and effective method fordrug design and screening.In Chapter 4, we applied GEP and SVMto analytical science: A brief description was given as follows: (1) Support vector machine, as a novel machine learning technique was used to construct QSAR model to describe the complexation of a-cyclodextrin with mono-and 1,4-disubstituted benzene derivatives molecular descriptors. The association constants (Ka) for the inclusion complexation of cyclodextrins and benzene derivatives are calculated and the models were found with a high precision. The excellent prediction results with correlation coefficient of heuristic method and support vector machines were 0.94 and 0.98 respectively. The leave one out cross-validation correlation coefficient of heuristic method and support vector machine were 0.92 and 0.95 respectively. We also found that six parameters of Molecular Weight, Max Bonding Contribution of a MO, RPCG, RPCS, DPSA-3 and BETA Polarizability not only can be used to predict Ka of the inclusion complexation of cyclodextrins and benzene derivatives, but also to explain the mechanism of cyclodextrin combine with guest. The advantages and disadvantages of two approaches were discussed, and it is concluded that support vector machines is the better method to make QSAR models for predicting Ka. (2) The rat LD50 of 88 diverse aldehydes was modeled using the descriptors calculated from the molecular structure alone using a quantitative structure-activity relationship technique. The heuristic method and support vector machine were utilized to construct the linear and nonlinearprediction models, leading to a good cross-validation correlation coefficient (Rcv) of0.90 ^P 0.93 respectively. The specific information described by the heuristic linear model could give some insights into the factors that are likely to govern the rat LD50 of the compounds and the prediction results of the nonlinear SVM model seem to be better than those of HM. (3) The gene expression programming, as a novel type of learning machine, for the first time, was used to develop a quantitative structure-activity relationship model of 39 compounds of molecular imprinting polymer based on calculated chemical parameters. The comparison with heuristic method and support vector machines approaches reveals a good prediction of geneexpression programming.

Keywords/Search Tags:

Chemoinformatics, QSPR/QSAR, Gene expression programming (GEP), Support vector machine (SVM), Disease diagnosis, Drug design

PDF Full Text Request

Related items

1	A Computational Study Of Drug Molecules Based On Chemoinformatics Methods
2	Application Of Support Vector Machine (SVM) For Prediction Of Drug Metabolism And Drug Inhibitory Activity
3	Using Ligand Similarity Profile For Drug Design
4	New Strategies To Improve The Predictivity Of QSAR Models And Its Application In Medicinal Chemistry
5	Applications Of Three-Dimensional Biologically Relevant Spectrum In Computer-Aided Drug Design
6	Structure-Based Drug Design On Aurora Kinase And Predict Of PKa Values Of Aromatic Carboxylic Acids
7	Structure-activity/property relationships of kinase inhibitors based on calculated and measured parameters, and applications in drug design and development
8	Research On Training Method Of Support Vector Machine And Its Application In Disease Diagnosis
9	Anticancer Drug Response Classification Based On Deep Neural Network And Support Vector Machine
10	Theoretical Prediction Of Drug Toxicity Based On Machine Learning Approaches