Font Size: a A A

Study On Structure-activity Relationship Of PARP-1 Inhibitors By Machine Learning

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:H Y YinFull Text:PDF
GTID:2381330605971582Subject:Pharmaceutical engineering
Abstract/Summary:PDF Full Text Request
Poly-ADP-ribose polymerase-1(PARP-1)is the highest content isoform of PARPs and considered a promising anticancer target.PARP-1 is involved in DNA repair and cell death regulation in eukaryotic cells.It is an important research target for breast cancer,ovarian cancer and prostate cancer.This thesis takes the inhibitor of PARP-1 as the research object,and uses machine learning algorithms such as support vector machine(SVM)and random forest(RF)to conduct the classification study of high and low activity of PARP-1 inhibitor,scaffold clustering study and quantitative study of activity value.The specific work of this study is as follows:(1)Classification models of highly and weakly activity of PARP-1 inhibitors.A database containing 2416 compounds was collected for the first time,with IC50 values ranging from 0.21 nM to 210,000 nM.According to the distribution of the data set,two thresholds were used to classify the activity level:the compounds with activity value less than or equal to 50 nM were the high-activity compounds,and those with activity value greater than or equal to 500 nM were the low-activity compounds.This data set was divided into training set and test set by means of random stratified sampling,with a ratio of 3:1.1227 samples of the training set were used to build the model,while 410 samples of the test set were not involved in the modeling process at all,which were used to verify the generalization ability of the built model after the model was determined.Calculation of all compounds CORINA descriptor,MACCS fingerprints,ECFP 4 fingerprint for characterization,based on support vector machine(SVM)and random forest(RF)algorithm to optimization combination of the parameters and amount of descriptor,choose each descriptor the corresponding optimal model and algorithm,six model was established.In addition,application domain of each model was given calculated by the Euclidean distance.The accuracy(Q)of the training set and test set of these six models was greater than 0.85,and the Matthews correlation coefficient(MCC)was above 0.7,indicating that the model had good prediction and generalization ability.Finally,there were 14 core skeletons were obtained clustered by density-based spatial clustering of applications with noise(DBSCAN)algorithm.It was found that compounds represented by 4 types of skeletons have a high possibility of becoming high activity inhibitors.(2)Quantitative model of PARP-1 inhibitors.Compounds containing more compounds were selected for the regression study to determine the IC50 value by enzyme-linked immunoassay.There were 513 compounds in the quantitative data set,which were randomly divided for three times.The training set and test set contained 385 and 128 compounds,respectively.The CORINA descriptor and RDKit 2D descriptors of the compounds were calculated,and 18 regression models were obtained using SVM,RF and MLR algorithms.Among them,the random forest models(1 A series)based on the CORINA descriptors had the highest average determination coefficient(R2mean=0.595)and the smallest mean root mean square error(RMSEmean=0.524)in the test set.Through the analysis of the common descriptors of the 1A series models,it was found that atomic charge,the number of tetrahedral chiral centers,the total molecular charge and the molecular diameter contributed a lot to the biological activity predicted.The machine learning models developed by this work can be used for the virtual screening of PARP-1 inhibitors,and the results of molecular skeleton clustering can also help the design of lead compounds.
Keywords/Search Tags:parp-1, inhibitors, classification model, quantitative model, scaffold clustering
PDF Full Text Request
Related items