Font Size: a A A

Research On The Job Scheduling Policy And Cluster Management System In High Performance Computing

Posted on:2016-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhouFull Text:PDF
GTID:2272330479998235Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The computational science has gained great achievement in scientific research, engineering technology and military application since it was born. High performance computing technology gains widespread attention of worldwide people in the development of computational science, together with its usability, flexibility and platform-independence. At the same time, high performance computing is also the foundation of some computational subjects. Because of this, the cluster technology in high performance computing is getting more and more attention by scientific research department in our country.The cluster management system includes three aspects: resource management, job management and user management. Some research results of resource management, job management and user management in high performance computing have been elaborated in this thesis, such as the origin and growing trend of the high performance computing, and advantages compared to the former mainframe computers. After that, a typical high performance computing cluster has been introduced to analyze the structure of an actual HPC platform.This thesis is organized as follows:1. The development procedure of hardware structure in clusters has been introduced. And then, the constitute of Linux high performance computing has also been analyzed, together with the function of nodes and network used in clusters. Meanwhile, the analysis of the software structure which is used in clusters has also been raised.2. In this paper, the illustration of cluster management software—CCLAB which has been developed in the frame of Django using Python language is proved. The CCLAB is based on the resource manager Torque and the job scheduler Maui. With the help of monitoring software Ganglia, the CCLAB can monitor the resource utilizations of the cluster.3. The development procedure of the three Portlet which are “Job Scheduling”, “Cluster Users”, “Cluster Monitoring” has been analyzed in this thesis. The advantage of the second generation of cluster management system gateway has also been elaborated. And after that, the method of realizing the Web views and design of URL in CCLAB is elaborated. And then, the GPFS parallel file system and the method of remote power management which is used in this paper is elaborated, too.4. This thesis has provided the sort and procedure of job scheduling. Then, it abstracts the job scheduling process and uses some mathematical models to describe it, which is according to the job scheduler Maui. And then, it offers the DAG diagram to illustrate the procedure of job scheduling. Advantages and disadvantages of the FCFS policy, Priority policy, Firstfit policy and Bestfit policy have been discussed in this thesis. On the basis of those analyses, a new policy named “BLPRB” has been raised. It has expanded the research of single policy and dual policies. It also analyzes the node balance evaluation, the determination of jobs’ priority, the information of reserved resource and the procedure of backfilling jobs. At last, the algorithmic analysis of BLPRB policy, which can determine the latest execution time of reservation jobs and solve big jobs’ starvation, are also proposed. Certainly, this policy should be integrated in Maui scheduler and be realized by the built HPC platform.In this thesis, the new policy has been used in an actual HPC platform. At last, the results show that, compared with the Firstfit and FCFS policy, platform with this new policy has an obvious promotion. The results also show that the max damping of job response time are 26.17% and 25.99%. Meanwhile, compared with Firstfit and FCFS policy, the max damping of throughput rate are 30.77% and 54.55%. Compared with Firstfit and FCFS policy, the new BLPRB policy offers a better upgrade rate of max damping of job average waiting time, which is 35.22% and 60.58%. CCLAB has greatly relieved the pressure of cluster administrators and has some practical value. And the new provided policy can partly improve the resource utilization rate.
Keywords/Search Tags:Job-scheduling, Torque, customized-priority, backfilling-policy, BLPRB policy
PDF Full Text Request
Related items