Font Size: a A A

Research On Parallel Technology For Key Algorithms Of High Throughput Drug Virtual Screening Based On CPU-MIC Cooperation

Posted on:2016-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChengFull Text:PDF
GTID:2334330536967348Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Drug virtual screening uses molecular docking technology to calculate the binding energy of protein and small molecules in the compound library,and predicts the physiological activity of candidate compounds.It is very important to find out the drug candidates for new acute infectious diseases when they outb reak.At present,there are over 3.5 million purchased molecular compounds on the earth,and all the screening work for a certain protein target needs more than ten years.The current high throughput method takes at least tens of days even using of TianHe-II supercomputer.Therefore,it is necessary to develop a high throughput drug virtual screening platform to deal with sudden malignant diseases,such as Ebola hemorrhagic fever.Based on this issue,this paper optimized the drug virtual screening software D3 DOCKxb according to its key algorithm LGA(Lamarckian Genetic Algorithm).Firstly,we achieved an efficient parallel LGA algorithm based on multi-core CPU.Then we focused on the CPU-MIC collaborative model parallel molecular docking algorithm.Finally,we developed a large scale high-throughput virtual screening platform based on TianHe-II,which has the ability to complete the screening work of all the drugs molecules on earth within a day.Our work includes:1.D3 DOCKxb is a derivative of Auto Dock(ve rsion 4.2.3),focusing on the effects of halogen bond in drug discovery by adding knowledge-based scoring function XBPMF and quantization-based scoring function XBScoreQM.However,memory utilization in this software is unreasonable and the data structure is complex,with two scoring functions add ing a large amount of calculation,resulting in low efficiency.To solve the problem,this paper redesigns the algorithm and data structure to implements the efficient parallel LGA algorithm based on CPU cores by e ncapsulating score functions,replacing IO by buffer,thread binding and displacing copyin primitive by Write-Back-First-Time.Multi threads D3 DOCKxb obtained a linear acceleration on 24 CPU cores of TianHe-II.2.Intel Xeon Co-processor,also called MIC,is a powerful hardware on parallel acceleration and float computing.We posted D3 DOCKxb to MIC with offload mode with lots of optimization,such as vectoring,memory reuse,merging parallel domain,achieving a 12 x speedup on a single MIC card.Then we realized new molecular docking software by CPU-MIC heterogeneous collaboration,called mD3 DOCKxb.mD3DOCKxb can achieve a 50 x speedup on a node(24 cores CPU and three MIC cards)of TianHe-II.3.High throughput applications need to solve the problem of massive IO,instantaneous communication,so we developed a highly efficient communication engine for mD3 DOCKxb.The engine successfully solves the problem of dynamic partitioning and load balancing by multi-layers control and jobs handled in stages.Besides,massive IO and instantaneous communication are avoided by processes sleep by rank.mD3 DOCKxb keeps parallel efficiency above 84% in 500,1000,2000,4000,6000 and 8000 nodes of TianHe-II.The parallel efficiency is 84.7% even in 8000 nodes(196000CPU core +1368000MIC).mD3 DOCKxb finished 35 million dockings between drug molecules and Ebola virus protein VP35 within 20 hours.
Keywords/Search Tags:Drug Virtual Screening, CPU-MIC Collaboration, Tianhe-2, Lamarckian Genetic Algorithm, High Scalability, High Throughput
PDF Full Text Request
Related items