Font Size: a A A

The Identification And Prediction Of Biological Activity And Toxicity For Some Small Organic Molecules

Posted on:2013-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LuFull Text:PDF
GTID:1110330371962143Subject:Materials science
Abstract/Summary:PDF Full Text Request
With the development of genomics, information technology, and biological inspection means, the amount of biological information is rapidly increasing. The tremendous resources of biological information lead to the birth of a new interdisciplinary ? bioinformatics. Researchers have been exploring biological knowledge by capturing, managing, depositing, retrieving and analyzing the biological information. Data mining is used to extract potential and useful information from the databases, and playing an increasingly important role in the study of bioinformatics. In this dissertation, ensemble learning methods are used to investigate identification and prediction of biological activities and toxicities of some small organic molecules. The main contributions of the dissertation can be summarized as follows.I. Prediction of biological function of small molecules based on ensemble learning algorithmStudies on biological functions of small molecules can help understand biological phenomena in molecular biology and disease mechanism in medicine. To discover biological functions of small molecules, a great deal of manpower, materials and financial resources are required in experiments. In this study, an ensemble learning approach is proposed. Based on the AdaBoost method with function group composition, a novel method was used to quickly map the small chemical molecules back to the possible metabolic pathway which the small molecules belonged. As a result, 10-folds cross validation test and independent set test on the model reached 73.71% and 73.8%, respectively. It is concluded that the proposed approach is promising in mapping unknown molecules'possible metabolic pathway. Based on the models for predicting small molecules'metabolic pathways, an online predictor developed in our laboratory is available at http://chemdata.shu.edu.cn/pathway. II: Prediction of interaction between enzymes and small molecules in metabolic pathways with integrated multiple classifiersInformation about interactions between enzymes and small molecules is important for understanding various metabolic bioprocesses. We applied a majority voting system to predict the interaction between enzymes and small molecules in the metabolic pathways by combining several classifiers including AdaBoost, Bagging and KNN. The advantage of the strategy is attributed to the fact that a predictor based on majority voting systems usually can provide results with better reliability than any single classifier. The prediction accuracy of a training dataset and an independent testing dataset were 82.8% and 84.8%, respectively. The prediction accuracy for the networking couples in the independent testing dataset was 75.5%, about 4% higher than that reported in a previous study. An implementation of the proposed prediction method is available at http://chemdata.shu.edu.cn/small-enz.III. Quantitative structure-property relationship based on support vector regression for narcotics toxicitiesQuantitative structure-toxicity relationship of narcotics was studied using support vector regression, multiple linear regression, partial least squares, and back propagation artificial neural network. The molecular descriptors contributing to toxicities were selected from various features obtained using quantum chemistry methods. The root-mean-square errors of SVR, MLR, PLS and BP-ANN models were 0.283, 0.385, 0.392 and 0.466 respectively. The results indicate that the prediction accuracy of SVR model is higher than those of MLR, PLS and BP-ANN models. It is expected that SVR is a useful chemometric tool in the research of structure-toxicity relationship.
Keywords/Search Tags:bioinformatics, machine learning, ensemble learning, support vector machine (SVM), enzyme, small molecule, metabolic pathway, functional group composition, majority voting, narcotics toxicity, quantitative structure-property relationship
PDF Full Text Request
Related items