| The metabolic stability as a key property of drugs is very important which determines the major properties of pharmacokinetics.So it has great value to estabilish effective and reliable methods for prediction of in vitro stability and in vivo pharmacokinetic parameters,which can be used to reduce the risk of high cost due to the unstable or stable of drug candidates in clinical stage.With a large number of data obtained in vitro and in vivo experimental analysis by high throughput screening,and the related available database and high quality data,it is the condition for establishing the in silico prediction models for now.Machine learning methods,with their ability in classifying diverse structures and complex mechanisms,are well suited for predicting the stabilities of drugs and parameters of pharmacokinetics.However,there are still a lot of urgent problems to be solved in the prediction of the metabolic stability in vitro and in vivo based on machine learning methods.Usually these models are very nice for internal validation but very poor for external validation,with a very bad ability of generalization.The main reason for this is as follows: First,too many features are used in the model but not the key features for describtion the specific property.Second,the lack of representation of the training set due to the limited training samples.Therefore,in the first empirical study of this paper,I build a support vector regression(SVR)model to predict the plasma stability of compounds in vitro by the new feature selection method.In the process of feature selection,I found that reasonable feature selection and combination can greatly improve the external prediction performance of the model.At the same time with the KPCA further illustrates the importance of feature selection,with repretation the linear separable conditions in high dimension space with different feature combinations.The finally SVR model can provide a convenient support for in silico human plasma stability prediction and screening.Meanwhile,the important descriptors and fingerprints can provide help for design of pro-drugs and soft-drugs.In the second empirical study,the in vivo pharmacokinetic parameters of half-life were selected as the research targets.Firstly,in order to enlarge the structure diversity and number of the training set,I collected the data form database and literatures as many as I can,the finally data of the drugs half-life can be the largest data set as far as I know.After that,I analyzed the impacts on the half-life with the molecular descriptors and external factors.Then the classical machine learning method,the Naive Bias classifier and Recursive Partitioning are used to establish the model to identify half-life levels of the drug,at the same time,the prediction performance of the combination of different molecular fingerprint and descriptor is also explored.The final model of Naive Bias has certain recognition ability.The descriptors and fingerprints that associated with drugs half-life,and the prediction model can help to distinguish the relative long and short half-life of drugs before clinical trials and different needs before the design and synthesis of the drugs. |