The Compound Similarity Analysis System Based On Cloud Computing Technology | | Posted on:2013-01-15 | Degree:Master | Type:Thesis | | Country:China | Candidate:J H Li | Full Text:PDF | | GTID:2241330395451101 | Subject:Computer software and theory | | Abstract/Summary: | PDF Full Text Request | | With the development of informatization process of the whole society, the data needing to be processed is growing explosively. New technology for storing, processing and managing data is required. Cloud computing is a new technology which can abstract different hardware resource to a large computing resource pool and provide on-demand computing resource to users.Thus cloud computing would be a good resolution to deal with large scale data problem.The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. One of the basic principles behind ligand-based activity prediction models is the widely accepted similar property principle. This principle is based on the observation that chemicals of similar structures frequently share similar physicochemical properties and biological activities. However, the process of measuring similarity could be very costly and time consuming. Especially with the recent advancements in compound synthesis, more and more compounds become available. As a result, most of traditional analysis tools running in a standalone machine are inadequate for the task.In this paper, we study the relevant theory, characteristic and applications of cloud computing and provide a solution of building a new system for measuring similarity based on cloud computing technology. The main works and achievements of this paper are as follows:(1) Providing a parallized solution based on MapReduce programming model to meet the massive computing requirement of measuring similarity.(2) Providing a round robin style partitioner algorithm to reduce the impact brought by data skew and verify it with many tests. Experiments show that the new partitioner algorithm can improve the performance by5%(3) Improving the fault tolerance handling mechanism by reducing unnecessary computing when any slave node fails.(4) Implementing the parallelized method using apache Hadoop tools and doing a lot of tests to verify its efficiency and scalability. Experiments show that the speedup efficiency can achieve up to more than0.9. | | Keywords/Search Tags: | biologically active compounds, bioinformatics, cloud computing, scalability, Hadoop | PDF Full Text Request | Related items |
| |
|