Font Size: a A A

Research On The Virtual Screening Of Drug Protein Based On The Machine Learning

Posted on:2017-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:M Y WangFull Text:PDF
GTID:2334330482986433Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In late twentieth century, with the rapid development of computer technology,new drug discovery has entered a new phase.Among them, molecular docking based virtual screening due to its good universality has been recognized by most of the agencies and pharmaceutical companies. However, the accuracy of this strategy to a large extent dependent on the accuracy of the scoring function, and because of the limits of theory and method, it is still not a completely correct method. On the other hand, we need a large number of laboratory crystal structures to complete virtual screening process. When the research target structures are not enough to meet the number of virtual screening, we have to join the docking poses or homology modeling data. However, the quality of the mixed date set may reduce because of the mixed with the wrong docking poses and it will affect the screening efficiencies. In recent years, due to the continuous development of machine learning techniques,the application of machine learning theory to improve the virtual screening effect has become a research focus. At present, the machine learning algorithm cannot make the compute have a strong learning ability as human being, but with the proposed algorithm for a large number of specific learning tasks, the computer has the ability to extract features from a large number of data and discover the hidden rules, so that the machine learning theory as a powerful auxiliary means, has been introduced to the computer aided drug design.Based on this background, this paper puts forward a method that using the machine learning method to improve the virtual screening process based on the molecular docking. We used the protein-ligand interaction fingerprint to encode the interactions between protein coding and its ligand and reduce mixed error docking results influence on the final results of the screening by means of ensemble learning. This paper firstly introduces the concept and method ofvirtual screening, and the results obtained from the combination of machine learning and virtual screening at home and abroad. Then, the concept and development of virtual screening process and protein ligand interaction fingerprint based on molecular docking were introduced. In order to demonstrate the proposed method is effective, this paper selects SRC and Cathepsin K both the pharmaceutical field hot drug targets. The SC-PDB database and PDB database were used to deal with the related data, and then the BP neural network was used to predict the protein ligand interaction fingerprint of these two kinds of target proteins. At the same time, based on the naive BP neural network,genetic algorithm and simulated annealing algorithm are introduced in this paper to solve the problem of slow convergence and easy to fall into local optimal value of BP neural network in training data. In the virtual screening phase, this paper will use the machine learning algorithm to generate the IFP as the input of the classification algorithm, and simulate the actual situation to join the docking poses. In order to solve the problem that the quality of the training set is not high and the virtual screening effect is low, this paper introduces the ensemble learning idea at the algorithm level, and optimizes the new virtual screening process. In the experimental construction and analysis part, the PDB database and St ARLITe database are used to verify the effectiveness of the proposed method. Experimental results show that the method proposed in this paper can effectively improve the accuracy of virtual screening, and it has a certain guiding role in the development of new drugs.
Keywords/Search Tags:machine learning, ensemble learning, virtual screening, protein ligand interaction fingerprint
PDF Full Text Request
Related items