Font Size: a A A

An Improved Bayesian Model Applied To The Prediction Of Cytochrome P450 Enzyme-Substrate Selectivity

Posted on:2017-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhengFull Text:PDF
GTID:2370330590991720Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Each drug can potentially be metabolized by one or more Cytochrome P450(CYP)isoforms.Therefore,Cytochrome P450 is the most important drugmetabolizing enzyme family.With the development of classification and prediction of the isoform specificity of CYP enzymes,the understanding of drug metabolisms is widely enlarged and therefore may help guide the process of new drug researches.In the present study,we provide an improved Bayesian model as well as another two commonly used machine learning methods K Nearest Neighbor(KNN)and Support Vector Machine(SVM)as comparison to predict metabolic information of 11 CYPs based on selected structural and physicochemical properties of 742 substrates relate to those CYP sub-families.To clear up those vast information,a database named CYPMeta,which contains 742 small molecules substances and more than 1500 metabolic information are created.Those substances are separated into three groups named substrate inhibitor and inducer according to their characteristics and are presents in the type column on the main table of CYP-Meta,besides,columns of CID and names of molecular are also included in the main table.To ensuring information integrity,CID and molecular name are used as a joint primary key considering different names belong to one molecular.Wish fulfilled,with the average accuracy of more than 0.89 for Bayesian model compared to less than 0.73 for both KNN and SVM,the results shows that the NaiveBayesian model we provide exhibits higher predictive capability than the other two methods against the test sets for all eleven CYP450 isozymes.Besides,some other indicators from the results such as sensitivity and selectivity also revealed the same conclusion that the Bayesian model performed higher level of prediction than others.Moreover,by using graphical and data analysis we discussed the most important descriptors which were scored and automatically selected by our algorithm.Finally,it is worth noting that by using the newly designed Na?ve-Bayesian algorithm the prerequisite of independent random variables is no longer needed for the datasets and we also provide an advanced algorithm for conditional probability calculations,therefore,we successfully solved the two dominant restrictions in the usage of Na?veBayesian method.Additionally,the pre-set of single-label and multi-label before prediction is no longer needed in our models and all three models we built in this study are broadly applicable and can be easily applied to other fields of research.
Keywords/Search Tags:Data mining, machine learning, Na?ve-Bayesian, support vector machine, K-nearest-neighbor, cytochrome P450
PDF Full Text Request
Related items