Research On MapReduce Performance Optimization Based On Hadoop

Posted on:2018-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:L L Feng

Full Text:PDF

GTID:2348330536979650

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of Internet technology,network and enterprise production need to deal with more and more data,and cloud computing has become a popular computing model of large data processing.Hadoop as an open source system platform for cloud computing,and soon became the mainstream of large data processing technology.With the wide application of Hadoop cluster,its performance problem has attracted people’s attention.Load balancing plays an important role in cluster performance and is the focus of this thesis.In this thesis,we study and analyze the load balance problem in MapReduce operation,and achieve the aim of performance optimization.In the heterogeneous environment,the node computing ability is different.In the task scheduling process of MapReduce,the task load is unevenly distributed,which causes the individual nodes to execute too long and affects the response time of the whole operation.This thesis presents a kind of task scheduling algorithm based on load balancing.By analyzing the characteristics of the task and the performance of the nodes in the heterogeneous cluster,the algorithm obtains a task schedule load balance metric,which provides the basis for the task assignment of the nodes,so that each node matches its performance in the process of task scheduling and the dynamic adjustment of the load is realized by establishing the node communication model during the execution of the task,which ensures the load balancing in the task scheduling.The default Hash partitioning mechanism in MapReduce execution process,will result in the data load tilt problem when processing the intensive data.In this thesis,a partition cost model is proposed to evaluate the load balancing problem of the partition,and a new fine-grained partitioning algorithm is proposed,which increases the number of partitions and reduces the tilt data in the partition,to ensure the relative balance of the data received by the node through the partition cost model.At last,by setting up the experimental environment,and designing the corresponding experimental scheme to verify the task scheduling algorithm and the fine granularity partition algorithm which optimize the cluster load balancing.

Keywords/Search Tags:

MapReduce, load balancing, task scheduling, partition

PDF Full Text Request

Related items

1	Partition Of Task Type Based On Resources And Real-time Requirements And The Research Of Load Balancing On It
2	Research On Key Issues Of Task And Job Scheduling For MapReduce Clusters
3	Research On Stream Program Task Partition And Scheduling For Multi-core Processor
4	Task Scheduling And Load Balancing Methods In NOW
5	Research And Implementation Of Optimized Load Balancing Algorithm In Task Scheduling System
6	Research On Task Scheduling Based On Load Balancing And Task Overtime Rate
7	Research On Load Balancing And QoS Oriented Multi-objective Cooperative Task Scheduling In Cloud Environment
8	Research On Load Balancing Of Task Scheduling In Cloud Service System
9	Design And Implementation Of A Load Balancing System Based On PVM
10	Research Of Task Partition And Resource Allocation Algorithms For Load Balance In Spark Computing Environment