Research Of Hadoop Job Scheduling Based On Priority And Reliability In Cloud Computing Environment

Posted on:2014-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:L J Gu

Full Text:PDF

GTID:2268330425983629

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, Cloud computing has been widely adopted in massive data processing due to the higher processor performance, the higher reliability and scalability. In the background of network information explosion, massive data processing becomes a new challenge in the field of computer science. MapReduce is a new distributed data processing programming model, its main features is that the simplification of the traditional distributed program development, and developers just need to focus on business logic program without thinking about the details of the distributed implementation. Hadoop is the open source implementation of MapReduce, and it provides a data processing foundation platform for enterprise and research institutions of the massive data processing. The main purpose of Hadoop scheduling is to improve the utilization of cluster resources and reduce the running time of the userâ€™s job. Hadoop job scheduling in a cloud environment brings new challenges to academia and industry. Improving and enhancing the ability of job scheduling is of great significance to improve the performance and resource utilization of the Hadoop.First, this paper introduces the concept of cloud computing and the architecture. We delve into the MapReduce programming model and the distributed file system (Hadoop Distributed File System), analyzing the Hadoop job operating mechanism as well as the existing scheduling algorithm.Second, for Priority Based Weighted Round Robin algorithm does not consider the load on the system level, and it canâ€™t fully utilize the processing capacity of the compute nodes in heterogeneous clusters issue, this paper proposes an improved priority scheduling algorithm (Priority Based Multi Scale, PBMC). The PBMC scheduling algorithm is used to classify the clusterâ€™s nodes which have distinct computing capacity and in accordance with the computing ability to sort. The PBMC scheduling algorithm considers the overall system load level, selecting the higher priority tasks assigned to the computing power of a good node. Experimental results show that PBMC algorithm fully consider the performance of the different nodes in the cluster, reducing the job completion time. It also improves the utilization of cluster resources.Finally, by studying the job scheduling mechanism of Hadoop, in view of the randomness and convergence of service and the reliability of the cloud computing system and cluster resource utilization problem, we use queuing model to establish a cloud computing system model, using the compute nodes load values to divide the reliability of nodes. Based on the classification of node reliability, it proposes a new job scheduling algorithm (Job Scheduling Based on Node Reliability, JSBNR). JSBNR puts forward a computing node reliability evaluation model and then launch a matching method of nodes and tasks. Experiments show that JSBNR improves the reliability of the cluster and the utilization of resources, at the same time; it has a good scalable performance.

Keywords/Search Tags:

Cloud Computing, Hadoop, Job Scheduling, Priority, Reliability, Queuing Method

PDF Full Text Request

Related items

1	Research And Optimization Of Job Scheduling Algorithm Based On Hadoop
2	A Priority-based Scheduling Algorithm For Hadoop
3	Cloud Computing Method Of Reliability Evaluation And Task Scheduling Research
4	Job Scheduling Algorithm Based On Hadoop Platform Optimization Research
5	Research On Reliability Of Information System In Cloud Environment
6	Research On Reliability Improvement Method Of Cloud Computing System Based On System Level Modeling Theory
7	Research On Job Scheduling Method Under Hadoop Platform
8	QoS Sensitive Cloud Workflow Scheduling Optimization Method
9	Research On Task Scheduling Algorithm For Generalized Forestry Geographic Information Processing In Cloud Computing
10	Research And Implement Of Job Scheduling Method For Multi_user Mapreduce Clusters