A Resource Sharing And Error Isolation Mechanism For Multi-task Data Processing Engine

Posted on:2015-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:F L Wan

Full Text:PDF

GTID:2308330452457200

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

MapReduce is a state-of-the-art computation paradigm and has been widely used forprocessing and analyzing large-scale datasets both in industry and academic. Hadoop is anopen-source implementation of MapReduce which follows a Master/Slaves model. Asother similar systems which adopt the master/slave model do, Hadoop suffers a singlepoint of failure in JobTracker. In this paper, we analysis the previous work about the SPoF,and propose a solution for the single point of failure in Hadoop, which is called Colt.Colt focuses on the single point of failure of MapReduce, and learns from the thedistributed coordination service--Zookeeper, and finally adopts Zookeeper as a managertool which works in the system instead of an optional component. In our design, wechange the Master/Slaves model into a two-level network model, i.e.Mater-Zookeeper-Slaves. In such circumstance, the master becomes stateless, and it willnot be necessary for master and slaves to connect with each other directly butcommunicate via Zookeeper. When the master fails, slaves just work as if nothinghappened, once the master restarts, the system and failed jobs would return to the normalstate. To make the performance of Colt comparable to original Hadoop, we take fulladvantage of Zookeeper and make it a parallel way to process heartbeat and scheduletasks.Our design is implemented in Hadoop-1.2.1, and the evaluations on Colt and Hadoopshow that, Colt can well address the single point of failure of JobTracker while keep theperformance in an error range of5%compared with the original Hadoop.

Keywords/Search Tags:

MapReduce, Single Point of Failure, Error Isolation, Recovery in TaskGranularity

PDF Full Text Request

Related items

1	Design And Implementation Of The Failure Recovery Mechanism In MapReduce
2	Design And Implementation Of System Failure Recovery Mechanism In Enterprise Level MapReduce System
3	Design And Implementation Of "Single-Point Failure-Recovery Project" For System Of The Centralized Securities Exchange
4	Research Of Task Recovery Stretegy Based On Checkpoint In MapReduce
5	Research And Design Of Failure Detector In Disastor Recovery Storage System Based On WAN
6	Study On Single Point Of Failure And Service Backup Mechanism In MAS-Based CSCW Architecture
7	Research On Fast Fault Tolerance Mechanism For Single Point Of Failure In Stream Computing Environment
8	Securities Firms Trading System "homogeneity" Research Of Fault
9	The Research On Single Disk Failure Recovery Based On Erasure Codes
10	Research On Control Layer Failure Detection And Recovery Algorithm In SDN Framework