Font Size: a A A

Improve Parallelism Of Task Execution To Optimize Utilization Of MapReduce Cluster Resources

Posted on:2016-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:L M ZhengFull Text:PDF
GTID:2308330476953340Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
MapReduce, as a programming model, has become an important solution to large-scale data-intensive processing. It has been widely used in various fields such as Web search, machine learning and e-commerce. Hadoop, as an open-source implementation of MapReduce, is widely used for offline massive data job. It consists of MapReduce and HDFS. In the study of Hadoop, we found data parallel in Hadoop is coarse grained, and it cannot take full advantage of multi-core system. Eventually, this would lower utilization and efficiency of the whole cluster. To improve Hadoop into a fine grained data-parallel frame, we propose a strategy that scales the parallelism of task execution in map/reduce task. We implement our strategy as a new feature for Hadoop. And our experiments show that strategy can not only optimize utilization of MapReduce cluster resources, but also speedup job completion time up to 3x.
Keywords/Search Tags:MapReduce, Parallelism, Resources Utilization, Multi-core, Subtasks
PDF Full Text Request
Related items