The Compound Similarity Analysis System Based On Cloud Computing Technology

Posted on:2013-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:J H Li

Full Text:PDF

GTID:2241330395451101

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of informatization process of the whole society, the data needing to be processed is growing explosively. New technology for storing, processing and managing data is required. Cloud computing is a new technology which can abstract different hardware resource to a large computing resource pool and provide on-demand computing resource to users.Thus cloud computing would be a good resolution to deal with large scale data problem.The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. One of the basic principles behind ligand-based activity prediction models is the widely accepted similar property principle. This principle is based on the observation that chemicals of similar structures frequently share similar physicochemical properties and biological activities. However, the process of measuring similarity could be very costly and time consuming. Especially with the recent advancements in compound synthesis, more and more compounds become available. As a result, most of traditional analysis tools running in a standalone machine are inadequate for the task.In this paper, we study the relevant theory, characteristic and applications of cloud computing and provide a solution of building a new system for measuring similarity based on cloud computing technology. The main works and achievements of this paper are as follows:(1) Providing a parallized solution based on MapReduce programming model to meet the massive computing requirement of measuring similarity.(2) Providing a round robin style partitioner algorithm to reduce the impact brought by data skew and verify it with many tests. Experiments show that the new partitioner algorithm can improve the performance by5%(3) Improving the fault tolerance handling mechanism by reducing unnecessary computing when any slave node fails.(4) Implementing the parallelized method using apache Hadoop tools and doing a lot of tests to verify its efficiency and scalability. Experiments show that the speedup efficiency can achieve up to more than0.9.

Keywords/Search Tags:

biologically active compounds, bioinformatics, cloud computing, scalability, Hadoop

PDF Full Text Request

Related items

1	Biodegradable soy-polymer delivery system for slow-release of micronutrients and biologically active compounds
2	Stereoselective Syntheses Of β-carboline Alkaloids And Biologically Active Compounds
3	Advancing sample preparation, separation, and detection methods in capillary electrophoresis for the analysis of biologically active compounds
4	Research And Implementation Of Coal Mine Dynamic Disaster Monitoring System Based On Hadoop Cloud Platform
5	Migrating High Throughput Material Simulations To Elastic Cloud Computing
6	Distributed Real Time Calculation And Application Of Pipeline Network Based On Hadoop
7	Research And Design On Hadoop–Based Driving Behavior Evaluation Model Of Fossil Oil Transport Vehicles
8	Research And Design Of A Pressure-controlled Drilling Overflow Monitoring And Diagnosis System Based On Cloud Computing
9	Research On Quantitative Proportions In Complex Systems Of Vegetable Blended Oils Based On Multiple Regression Model And Cloud Computing
10	Application Research Of Financial Sharing Center Of Y Company Under The Background Of Cloud Computing