Font Size: a A A

Research On Drug Target Recognition And Activity Prediction Model Based On Molecular Vibration Characteristics

Posted on:2020-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:C M JiaFull Text:PDF
GTID:2434330575476739Subject:Herbs Analysis
Abstract/Summary:PDF Full Text Request
Background:At present,the target of chemical constituents of most traditional Chinese medicines(TCM)and their biological activities are still uncertain,which has become one of the bottlenecks for elucidating the material basis and mechanism of action of TCM.The research on the target of chemical constituents of TCM and its biological activity can help to reveal the extent of the efficacy of TCM in the treatment of diseases and discover the quantitative relationship between the chemical composition of TCM and the new target.This will provide guidance and clues for the mechanism of the chemical constituents of TCM to exert therapeutic effects in vivo,and also help to reposition the target of the chemical constituents of TCM.With the advancement and development of science and technology,more and more the chemical constituents of TCM have been discovered.Due to the limitation of time and financial support,the determination of biological activity between the chemical constituents of TCM and related targets by traditional experimental methods will face great challenges.A quantitative prediction model for constructing the interaction between drugs and targets by machine learning is used to find out the quantitative relationship between the chemical constituents of TCM and the target,which makes up for the deficiency of traditional experiments.This method is considered as an effective means to study the quantitative relationship between drugs and targets.In recent years,more and more models predicting the interaction between drugs and targets have been reported.These models mainly determine whether there is an interaction between drugs and targets.A few models are aimed at the quantitative relationship between drugs and targets,but the predicted values of these quantitative models differ from the experimentally measured real values and are only for a type of quantitative relationship,the accuracy and scope of the model need to be further improved.Therefore,the establishment of a quantitative prediction model for the interaction between drugs and targets with high predictive performance and wide application range is a problem to be solved between the chemical constituents of TCM and targets.Objective:The purpose of this paper is to construct a quantitative prediction model for drug-targets interactions(DTIs)with high predictive performance and wide application range,in order to make up for the shortcomings of using experimental means to determine quantitative relationship between drugs and target.This can improve the accuracy and scope of the current DTIs quantitative prediction model,and provide some clues and guidance for clarifying the material basis and mechanism of TCM.Methods:Investigation of the quantitative relationship database about DTIs.The existing DTIs database was investigated from five aspects:reliability,accuracy,completeness,availability and applicability.The reliability of the data is mainly to investigate the source of the data.The accuracy mainly depends on whether the standard of the data collected by the database(the unit of the activity value)is consistent.The completeness mainly examines the coverage of the current DTIs.the availability Mainly to investigate the difficulty of obtaining data,the applicability mainly depends on whether the data information is perfect.Finally,determining the best data source for this article based on these five aspects.Construction of quantitative prediction model of DTIs.First,according to the collected DTIs relationship data to calculate the molecular descriptor of the compound and the sequence descriptor of the target,and screen the descriptor of the compound from the perspective of molecular vibration to obtain a subset of the characteristic descriptor of the compound,and integrate it into the DTIs quantitative relationship data set.Second,data preprocessing on the data set,including data cleaning,integration,transformation,and specification.Data cleaning refers to the removal of outliers.Data integration refers to the integration of collected data.Data transformation refers to the transformation of data into a form suitable for modeling.Data specification refers to the normalization of data.Third,characteristic screening,according to the selected feature subsets,the DTIs quantitative prediction model is constructed by using random forest,support vector machine and artificial neural network algorithm respectively.The reliability of the model is verified by cross-validation method.The constructed model is used to predict the training set and test set respectively.The experimentally measured values(true values)and predicted values are compared,and the difference and absolute difference are calculated.Drawing a scatter plot of the true and predicted values.The optimal model is screened according to the evaluation index of the regression model such as the decision coefficient(R2)and the mean square error(MSE).Comparing it to the reported model to prove reliability of model.Further verification of the applicability and reliability of the model.The application of the optimal prediction model in the prediction of the quantitative relationship between the chemical composition of Chinese medicine and the target.The quantitative relationship of chemical constituents of TCM and target interactions in the Binding DB database that is not involved in the model establishment is collected.According to the principle of data investigation,the data is collected and compiled to obtain a new test set.Forecasting the new test set using the best model obtained.Reliability and applicability of the model constructed in this study were verified by comparing the predicted results with the experimentally measured values.Results:The drug target quantitative relationship data in the ChEMBL database was selected as the data source of this paper.Six quantitative predictive models of drug target interactions quantified by EC50and KD values were established.Based on the datasets collected in this paper,quantitative prediction models for the interaction between drugs and targets quantified by EC50 and KD values were established,involving 2207 compounds and 1254 targets totaling21999 relationships.From the molecular vibration point of view,the descriptors of the compounds were screened to obtain 814 descriptors.First,the model constructed by random forest algorithm has good predictive performance on training set and test set.The model R2of EC50value quantification is greater than0.96,MSE is less than0.09.The model R2 of KD value quantification is greater than0.94,MSE is less than 0.12.Second,the model constructed by the support vector machine algorithm has better prediction performance on the training set than the test set.The ECs50 value quantified model has R2=0.9317,MSE=0.1270 on the training set,the test set R2=0.5759,MSE=0.8356.KD value quantified model has R2=0.9099 on the training set,MSE=0.1254,test set R2=0.5083,MSE=0.7290.Third,the model constructed by artificial neural network algorithm has better prediction performance than the test set in the training set.The model training set with EC50 value quantization is R2=0.7350,MSE=0.4867,test set R2=0.5211,MSE=0.9590.The model of KD value quantification on the training set R2=0.5857,MSE=0.5612,test set R2=0.2961,MSE=1.019.The comparison index of the regression model shows that the quantitative prediction model constructed by the random forest algorithm in the three machine learning algorithms of the model is the best,and the other two algorithms have the possibility of over-fitting.The same model evaluation index is compared with the reported model,and the results show that the optimal model constructed in this paper has higher prediction performance.The reliability of the optimal model is verified using a new test set,and the predicted result indicates that the predicted value is greater than the true value.The reason for this may be due to the different sources of data collection,and the data income standards in the Binding DB database and the ChEMBL database are different.Because the predicted values are greater than the true value and within a certain range,the correction factor can be set to obtain the ideal prediction result.The correction factor can be obtained by calculating the absolute value of the difference between the true value and the predicted value.This also proves to some extent the reliability and applicability of the quantitative prediction model of this paper.Conclusions:This paper first proposed molecular descriptors for screening compounds from the perspective of molecular vibration.A quantitative prediction model for the interaction between drugs and targets was successfully established.The regression model evaluation index is used to determine the reliability and accuracy of the drug and target quantitative prediction model constructed by the random forest algorithm.Support vector machine and artificial neural network algorithms may not be suitable for constructing large-scale quantitative prediction models for drug targets.By comparison,the model prediction performance and scope of application established in this study are better than those already reported in the literature.Finally,based on the optimal model,some drugs and targets in the Binding DB database are quantitatively predicted.The result show that the accuracy and reliability of the quantitative prediction model of drug-target interaction relationship constructed in this paper further prove the obj ectivity of compound descriptors from the perspective of molecular vibration.
Keywords/Search Tags:molecular vibration, machine learning, feature screening, drug-target quantitative prediction, chemical composition of traditional Chinese medicine
PDF Full Text Request
Related items