Font Size: a A A

Research And Improvement Of Self Adaptive Scheduling Algorithm In Cloud Platform

Posted on:2015-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:2308330485990523Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid growing of cloud computing, many business tycoons have begun developing corresponding products and service. As of 2013, the market of global cloud computing has exceeded to $ 100 million. In domestic, the cloud computing industry also has got high attention in industry of financial, medical, education, telecommunication and electricity. With the comparison of 4th generation network, cloud computing will get unprecedented development. Among its key technique, resource scheduling has been focused by majority of researchers and scholars. The effection of scheduling will affect the whole performance of cluster and the resource utilization. Current cloud computing environment has many features such as dynamic, heterogeneous and massive multi-type tasks complicating. With the increase of cluster size and need of user QoS, existing scheduling algorithms have increasingly difficult in adapting to the dynamic changes of the environment, and in meeting the needs of users QoS. Therefore, improving the ability of self-adaptive and satisfaction of user of job scheduler in cloud computing can use resources more efficiency and can meet actual demands.Through extensive of research, firstly this paper introduces the architecture of Hadoop, includes core techniques of MapReduce and HDFS. Secondly, according to the job life span, and we give a detail introduction of the job scheduling process from aspects of submission and initialization of job, control of JobTracker, control of TaskTracker and control of scheduler. Moreover, we analyze the existing Hadoop scheduler. FIFO Scheduler, simple and low cost, but it is neither suitable for multiple type jobs sharing cluster resources, nor consider the difference of user QoS. Fair Scheduler, overcome shortcomings of FIFO, but lack of considering data locality and job feature when kill tasks. It will cause massive data moving and increase network load. Capacity Scheduler, which is allocate resources bases on the job performance, also suitable for multiple type jobs sharing cluster resources. But this allocation strategy is easy to fall into the local optimum. In addition, each of these schedulers needs static settings before running and cannot adjust strategy dynamically based on job running status and resource utilization.This paper put forwards a self adaptive scheduling algorithm based on job classification. First, it will classify submitted jobs into "run" and "wait" by using the naive Bayesian classification method. Second, draw up an overload rule according to different resource utilization between different type jobs. The result of overload rule will be feedback to naive Bayesian classification. When facing the same classification result, this paper introduces the "utilization function". It will estimate the job finish time by using a user expected finish time that submitted with job and job running status. The estimate value will be then used for setting job priority. Finally, the proposed scheduler will assign task which has the highest priority. In this way, scheduler will learn from the previous allocate result and affect the next round allocation adaptively.Finally, this paper implements the algorithm under the Hadoop platform. Results show that compare to traditional scheduler, the proposed one consume more time to classify jobs, but gain higher data locality and self-adaptive. Compare to the original adaptive scheduler, the improved one raise the scheduling efficiency and CPU utilization, shorten the response time and heighten the user QoS.Cluster resources have been widely used.
Keywords/Search Tags:cloud computing, hadoop, adaptive scheduling, job classification, qos
PDF Full Text Request
Related items