Theoretical Prediction Of Drug Toxicity Based On Machine Learning Approaches

Posted on:2018-06-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:T L Lei

Full Text:PDF

GTID:1314330542951153

Subject:Pharmacology

Abstract/Summary:

PDF Full Text Request

Toxicity is one of the main reasons for the failure of drug candidates during development,so evaluation of the toxicity for drug candidates in the early stages of drug development and exclusion of the compounds with relatively high toxicity would effectively improve the efficiency and success rate of drug development.However,in vitro and in vivo experimental testing for most toxicity endpoints has the disadvantage of high laborintensity,high time consumption and cost inefficiency,and therefore it is quite demanding to develop efficient and robust in silico prediction models for high-throughput toxicity screening.In this thesis,by using a number of machine learning algorithms,we developed a series of in silico prediction models for acute toxicity,respiratory toxicity and urinary tract toxicity.The performance and applicability of these machine learning algorithms were discussed.The research results and conclusions are as follows:(1)Based on a comprehensive dataset of rat oral acute toxicity with 7,385 compounds,relevance vector machine(RVM),support vector machine(SVM),k-nearest neighbor regression(k-NN),random forest(RF),local approximate Gaussian process(laGP),multilayer perceptron ensemble(MPLE)and eXtreme gradient boosting(XGBoost)algorithms were employed to construct a series of regression prediction models.The modified chi-square statistics were used to reduce the data dimension of the hybrid set of molecular descriptors and fingerprints(PubchemFP or SubFP).The RVM with the Laplace kernel function achieved the best prediction performance(qext2=0.640?0.659).In addition,we constructed several consensus prediction models.The best consensus model could yield accurate predictions for the test set(qext2=0.689).In addition,we also analyzed the important molecular descriptors and molecular fingerprints related to acute toxicity.(2)A dataset of various respiratory toxicity endpoints in mouse was employed to develop multiple regression and classification prediction models by using a number machine learning approaches,including RVM,SVM,regularized random forest(RRF),XGBoost,naive Bayes(NB)and linear discriminant analysis(LDA).In order to determine the optimal subset of molecular descriptors,a four-tier strategy(normalization-chi-square filtering-univariate rfSBF filtering-recursive feature elimination based on RF)was used to reduce the data dimension of the original set of molecular descriptors.Among all of the prediction models,the model developed by SVM with the Laplace kernel function achieved the best quantitative predictions for the test set(qext2=0.707),and the XGBoost model gave the best classification predctions for the compounds in the test set(MCC=0.644,AUC=0.8935 sensitivity=82.24%,specificity=83.21%,and global accuracy=82.62%).In addition,several approaches were used to analyze the application domains of the models.By using the leverage method,41 response outliers(hi>0.004),23 structurally influential outliers(standard deviation>3)and 31 influential compounds(Cook's distance>0.00388)were determined.Finally,the structural features of the compounds that were predicted with large errors by the best regression model and those of the compounds misclassified by the best classification model were systematically analyzed.(3)Based on a dataset of various urinary tract toxicity endpoints in mouse,several algorithms(RVM,SVM,RRF,C5.0,XGBoost,Adaboost.Ml,SVMBoost and RVMBoost)were used to build multiple regression and classification prediction models.The optimal subset of molecular descriptors for regression and classification were selected by using recursive feature elimination based on RF.Among all of the prediction models,the rbfSVMBoost regression model achieved the best quantitative predictions for the test set(qext2=0.845),and the rbfSVMBoost classification model gave the best qualitative predictions for the test set(MCC=0.787,AUC=0.893,sensitivity=89.58%,specificity=94.12%,and global accuracy=90.77%).In addition,several approaches were used to analyze the application domains of the models.By using the leverage method,3 response outliers(hi>0.762),4 structurally influential outliers(standard deviation>3)and 10 influential compounds(Cook's distance>0.02797)were determined.Finally,the structural features of the compounds that were predicted with large errors by the best regression model and those of the compounds misclassified by the best classification model were systematically analyzed.(4)In addition,we also tested the performance and applicability of several new machine learning methods.The performance of RVM,XGBoost and SVMBoost is satisfactory,and that of RRF and laGP is relatively unacceptable,which needs to be improved.

Keywords/Search Tags:

Quantitative Structure-Activity Relationship, Quantitative Structure-Toxicity Relationship, QSAR, QSTR, Toxicity Prediction, Acute Toxicity, Respiratory Toxicity, Urinary Tract Toxicity, Support Vector Machine, Relevance Vector Machine, Random Forest

PDF Full Text Request

Related items

1	Study On The Prediction Of Drug Toxicity Based On The Molecular Structural Characteristic
2	Study On Prediction Of Drug Toxicity
3	Quantitative Structure-activity Relationship Studies Of The Toxicological Properties Of Benzene Compounds
4	Prediction Of Toxicity By Structure And Activity Relationship
5	Comparative Studies On Quantitative Predicting Models Of Environmental Toxicity And Skin Permeability For Organic Chemicals
6	Research On Modeling Methods Of Quantitative Structure-activity Relationship For Environmental Chemical Toxicity
7	The Research Of Toxicity Of Conotoxin MVIIA And Structure-activity Relationship Of SO-3
8	Classification And Quantitative Structure Nd Bioactivity Relationship Study On Human Cetylcholinesterase Inhibitors
9	On Structural Characterization For Representative Pharmaceuticals And Bioactivity Prediction Through Quantitative Structure-Activity Relationship
10	Quantitative Risk Assessment of the Pulmonary Toxicity of Nanoparticles by Machine-Learning-Enabled Meta-Analysis