Font Size: a A A

A Resource Sharing And Error Isolation Mechanism For Multi-task Data Processing Engine

Posted on:2015-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:F L WanFull Text:PDF
GTID:2308330452457200Subject:Computer technology
Abstract/Summary:PDF Full Text Request
MapReduce is a state-of-the-art computation paradigm and has been widely used forprocessing and analyzing large-scale datasets both in industry and academic. Hadoop is anopen-source implementation of MapReduce which follows a Master/Slaves model. Asother similar systems which adopt the master/slave model do, Hadoop suffers a singlepoint of failure in JobTracker. In this paper, we analysis the previous work about the SPoF,and propose a solution for the single point of failure in Hadoop, which is called Colt.Colt focuses on the single point of failure of MapReduce, and learns from the thedistributed coordination service--Zookeeper, and finally adopts Zookeeper as a managertool which works in the system instead of an optional component. In our design, wechange the Master/Slaves model into a two-level network model, i.e.Mater-Zookeeper-Slaves. In such circumstance, the master becomes stateless, and it willnot be necessary for master and slaves to connect with each other directly butcommunicate via Zookeeper. When the master fails, slaves just work as if nothinghappened, once the master restarts, the system and failed jobs would return to the normalstate. To make the performance of Colt comparable to original Hadoop, we take fulladvantage of Zookeeper and make it a parallel way to process heartbeat and scheduletasks.Our design is implemented in Hadoop-1.2.1, and the evaluations on Colt and Hadoopshow that, Colt can well address the single point of failure of JobTracker while keep theperformance in an error range of5%compared with the original Hadoop.
Keywords/Search Tags:MapReduce, Single Point of Failure, Error Isolation, Recovery in TaskGranularity
PDF Full Text Request
Related items