Font Size: a A A

Research On Virtual Screening Method Of Drug Proteins Based On Imbalanced Data Mining

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhaoFull Text:PDF
GTID:2491306314468674Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and the continuous improvement of the level of biochemistry,computer-aided drug design is gradually introduced into the development of new drugs.Among them,the virtual screening technology of molecular docking has been widely used,but this method of virtual screening technology still has certain limitations.On the one hand,the result of the virtual screening technology of molecular docking largely depends on the later scoring function.Due to the limitations of theory and research,there is no accurate and high-precision scoring function.On the other hand,the number of active compounds produced by the virtual screening technology of molecular docking is far less than the number of inactive compounds,thus generating data The problem of imbalance.Therefore,this paper proposes a virtual drug screening method based on unbalanced data,which combines mach ine learning technology and virtual screening technology to improve the virtual screening process.First of all,in view of the fact that traditional scoring functions tend to misjudge the binding conformation of the target protein and the ligand and affec t the accuracy of the scoring function,this paper uses SPLIF interactive fingerprints to encode the docking conformation of the protein and ligand instead of the scoring function.Using this one-dimensional coded data not only makes it easier to sample and classify data,but also improves the accuracy of evaluation.Secondly,in response to the problem of imbalanced data caused by the small proportion of active compounds in the actual virtual screening technology docking conformation,this paper proposes an improved genetic algorithm based on SMOTE,which preprocesses the number of active compounds,which not only reduces The unbalanced ratio of the data,but also preserves the integrity of the data.At the same time,in order to improve the accuracy of the virtual screening process,the idea of integrated learning is introduced,and the random forest(RF)extended from the Bagging integrated learning algorithm is integrated with the support vector machine(SVM),and the RF-SVM method is proposed.To improve the classification effect of the interactive fingerprint generated by the molecular docking conformation,the use of RF-SVM can not only effectively avoid RF overfitting,but also improve the accuracy and stability of the classifier.Finally,select positive and negative sample data from PDB database,Pub Chem database and SC-PDB database to simulate the virtual screening process.The experimental results show that the method proposed in this paper can effectively improve the accuracy of virtual drug screening.
Keywords/Search Tags:Virtual screening, interactive fingerprint, unbalanced data, preprocessing, random forest
PDF Full Text Request
Related items