| The application of high throughput screening(HTS)and combinatorial chemistry(CC)techniques did not improve the efficiency of drug discovery,but accumulated a large amount of bioactive data.With the development of open source drug design,more and more free bioactive data are available for public.How to use these data effectively to guide drug development is an important problem faced by medicinal chemists.Considering the limitation of the bioactivity conformation space of ligands,we proposed a molecular descriptor,Three-Dimensional Biologically Relevant Spectrum(BRS-3D),which was the shape-similarity profile with PDB(Protein Data Bank)ligands as templates.We applied the BRS-3D to quantitative structure-activity relationship(QSAR)analysis and ligand-based virtual screening(LBVS)studies and verified its effectiveness.The thesis included the following five parts:1)The three-dimensional(3D)biologically-relevant representative compound database(BRCD-3D)was constructed.Using the 9878 ligands in the sc-PDB database as candidates,we firstly calculated the 3D structural similarity between any two ligands and obtained the similarity matrix of all ligands.Then,based on the similarity matrix,the clustering approach was used to select 300 ligands with diverse structures,which were served as the representative subset of bioactive conformational space and termed BRCD-3D.2)The automatic calculation of BRS-3D was achieved and the influences of different parameters on its calculation results were discussed.With the 300 ligands in BRCD-3D as templates,each of the objective molecules under scrutiny was flexibly superimposed onto 300 ligands,producing a 300-dimensional vector composed of shape similarity scores,which was the BRS-3D of the objective molecule.Comparing BRS-3D calculated with different parameters,we found that the scoring method of 3D molecular superimposition was better than molecular docking,and the rational initial 3D conformation of objective molecule was helpful to obtain stable calculation result.In addition,the calculation of BRS-3D was not affected by the charge type and the calculation platform.3)Based on the support vector machine(SVM)method and BRS-3D as the characteristic variables,42 actives-decoys prediction models of different GPCR targets were established to validate the effectiveness of BRS-3D in the QSAR analysis.The results showed that the models based on BRS-3D could effectively distinguish the actives from decoys.The results of feature selection study for these models showed that the models bulit with all BRS-3D features achieved the best prediction performances.However,for a few models,using only 30% features of BRS-3D could also achieve similar performances,which significantly reduced the complexity and over-fitting risk of models.Besides,the performance of BRS-3D was compared with other 2D and 3D state-of-the-art molecular descriptors.The results showed that models built with BRS-3D performed much better than MOE 3D descriptors and even better than Dragon 2D descriptors in some data sets.4)13 benchmark data sets from DUD(Directory of Useful Decoys)were selected to evaluate the efficiency of similarity search approach based on BRS-3D.Firstly,the influences of different similarity metrics on the early recognition of actives with different structural type and overall prediction performance were compared.The results indicated that the Cosine similarity coefficient was the best metric for the similarity of BRS-3D between the query molecule and database compounds,which also demonstrated the BRS-3D method was effective in enriching the actives with different structural type and had scaffold hopping ability.Additionally,the performances of BRS-3D on the same 13 data sets were compared with FieldScreen,DOCK,LigMatch and 2D fingerprint methods.The results showed that the performances of BRS-3D were better than FieldScreen and DOCK methods,but inferior to LigMatch and 2D fingerprint methods.5)The predictive models based on BRS-3D were applied in the virtual screening for histone deacetylase 1(HDAC1)inhibitors.Several different screening models were built firstly,including Bayesian discriminant model based on physicochemical properties of HDAC1 inhibitors,active-inactive SVM discriminant model and high-low activity SVM discriminant model based on BRS-3D,kNN(k Nearest Neighbor)regression model based on BRS-3D,and 3D pharmacophore model based on SAHA—the approved drug for HDAC1.Then,these models were integrated into two multi-step virtual screening workflows,and based on which,the drug-like or lead-like molecules from Specs,Enamine and ChemDiv databases were filtered.Finally,144 hit compounds were obtained for the activity assay to the HDAC1,two of which with IC50 values of 43.99μM and 30.07μM,respectively.In summary,a new molecular descriptor termed BRS-3D was proposed in this thesis.We implemented the automatic calculation of BRS-3D and explored the various parameters in the calculation process.Then,the effectiveness of BRS-3D method in QSAR analysis and LBVS was verified,which provided an effective solution for the utilization and integration of bioactive data produced by HTS and CC techniques. |