As an emerging cloud computing model, MapReduce has been widely used in the large-scale intensive data applications, such as web crawling, scientific computing and data mining, etc. MapReduce library provides designers with a transparent hardware development environment, making it easy to store and use, and simplifies the problem of the past, parallel computing architecture of the underlying operating difficulties. MapReduce-based systems have the advantages of independent storage, high scalability and fault tolerance mechanisms. Although MapReduce itself with above advantages, due to mechanism at this stage is not mature, resource scheduling has always been one of the biggest limitations on its implementation efficiency.In this article, the famous open source environment implementation-Hadoop, is adopted for heterogeneous device environment and applications, to summarize the unreasonable resource scheduling mechanism in MapReduce and propose several corresponding improvement ideas. The main contents are as follows.(1) On the basis of MapReduce’s original way of resource scheduling in homogeneous environment, a Dynamic Proportional Resource Scheduling algorithm (DPRS) is promoted, dynamically monitoring the load status of nodes, rationally allocating tasks resources and improving the unbalance problems of original mechanism in heterogeneous environment.(2) To ensure data execution on local machines, a Local Computing Power Optimization (LCPO) model is promoted, eliminating the original backup overhead, reducing network traffic flow and improving Map tasks’efficiency in heterogeneous environment.(3) To improve the backup execution efficiency of straggled Reduce tasks and resolve the misjudged problems of straggled nodes in heterogeneous environment, the Fast Long Task Backup algorithm (FLTB) is proposed.(4) As for the uneven balanced distribution of input data among Map tasks, a heuristic data partition treatment is employed to improve the data imbalance problem in heterogeneous environment. |