Font Size: a A A

Research Of Job Scheduling On Hadoop Platform Based On Load Balancing

Posted on:2014-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:D HuFull Text:PDF
GTID:2248330398967714Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology, a large number of datawhich is closely related to the business operations, have stored in the enterprise IT system,we can say that these data are the core of enterprise development. And the development ofall IT systems is dependent on the data. All walks of life produce the vast amounts of dataevery day, and the amount of data grow explosively. At the end of2010, according to thestatistics of the Internet Data Center, the global data volume reached1.2million PB. By theend of2020, the global amount of data stored in electronic form will reach35ZB. Big Dataera has arrived. Hadoop is thrown and the advent of the era of data appeared, which is asoftware framework for distributed processing of massive data and is based on reliable,efficient and scalable processing. The scheduler is a very important component of theHadoop platform, and its main function is to assign free resources in the system to each jobaccording to a certain policy. It plays a vital role for the resource allocation and jobexecution in the entire system. So studying the job scheduling algorithm of Hadoop hasgreat significance.Firstly, this dissertation introduces the history of the Hadoop and architecture. Then doa more detailed description about Hadoop core technology: Hadoop distributed file system(HDFS) and Hadoop distributed data processing (MapReduce). And then do an analysis ofthe principle and the advantages and disadvantages of the Hadoop original schedulingalgorithm and LATE scheduling algorithm. In addition, for LATE scheduling algorithm islack of strategy in selecting the backup execution node for the backward task, we proposean improved LATE scheduling algorithm. Through classifying the workload in the Hadoopcluster and proposing methodologies to measure the node workload, the algorithm wasproposed in a new strategy with selecting the backup execution node for slow task. Finally,we introduce the building process of the six-node Hadoop cluster environment, and docomparative experiments about the LATE scheduling algorithm and improved LATEscheduling algorithm in the Hadoop cluster. Experimental results show that the improved LATE scheduling algorithm has certain advantages.
Keywords/Search Tags:big data, MapReduce, Hadoop, LATE
PDF Full Text Request
Related items