Font Size: a A A

Virtual Screening Based On Ensemble Docking And Machine Learning

Posted on:2022-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:X J WuFull Text:PDF
GTID:2491306572480384Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
Drug design is a hot research direction in computational biology.Considering the advantages of low cost and low time consuming of computer simulation,it has become a common method in the process of drug design to simulate the interaction between target protein and small molecule by using molecular docking software.The existing docking software often uses multiple conformations to describe the flexibility of small molecules,but the flexible treatment of protein is a huge challenge.The most common treatment method is "ensemble docking".As an important tool of drug design,virtual screening based on molecular docking has shown very important application value.Nowadays,this technology has become an indispensable part of drug discovery.At present,we need to solve the problem of how to select the protein structures with good enrichment effect from a large number of conformations for virtual screening.In this thesis,we use the homologous proteins of target proteins as their initial conformation sets,design and implement a conformation selection scheme combining ensemble docking and machine learning.In this paper,we choose the protein with better enrichment effect according to the designed scheme to distinguish the active ligand molecules from inactive compounds.First,the scheme is designed on the premise of unknown small molecules.Secondly,the scheme design is based on computer simulation of molecular docking,and the knowledge-based scoring function is used to evaluate the interaction between molecules.The calculated AUC value can determine the enrichment effect.In order to solve the above problems,this study uses ten different data sets from the DUD database,and uses the apo structure of each group of proteins as template to search for proteins with homologous sequence similarity of more than90%.The selected homologous proteins are clustered according to the RMSD size.After docking,the enrichment rate and AUC values of different protein conformations were calculated according to scoring function,and the homologous proteins with better enrichment effect were selected.Finally,we use machine learning algorithm to select the homologous protein structures with better enrichment effects.This algorithm combines the local characteristics of target protein to select homologous proteins.The level features of atoms and residues are extracted as input of random forest algorithm.The best parameters are searched by gridsearchcv and five times cross validation is carried out.The results show that the algorithm has a high success rate in selecting homologous proteins for Top1,top3 and top5 structures.
Keywords/Search Tags:Molecular docking, Virtual screening, Enrichment effect, Machine learning, Homologous protein
PDF Full Text Request
Related items