Font Size: a A A

The Design And Implementation Of Cross-Domain Parallel Jobs' Resource Co-Allocation In Grid Environment

Posted on:2010-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:S C XingFull Text:PDF
GTID:2178360272997583Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The Research of Grid Computing has become more and more popular, which has many applications in the field of life sciences, high energy physics, and aerospace and so on. The word "Grid" was derived from the power grid, whose aim is to enable users to use computer resources as easy as possible. This model puts network resources, such as computing resources, storage resources, data resources, communicating resources, software resources and so on together, shares them safely and pellucid to making a large high-performance global computer system, thus eliminates the information isolated island and resource isolated island.Before the technology of Grid Computing is widely used, the parallel program was usually executed in single cluster for high computing speed. Width the development of Grid Computing, more and more parallel applications look forward to running on the grid environment to achieve higher performance. However, because parallel applications ran on single cluster whose nodes had the same machine structure in the past, which would cause many problems such as synchronization waiting, deadlock, and load balance and so on when migrate them to the autonomous, heterogeneous, distributed Grid environment. The problem would be serious when doing cross-domain resources co-allocation, which reduced the efficiency of the parallel jobs greatly.For the resource co-allocation of the cross-domain parallel jobs, the existing manner, such as DOROC and Resource Reservation can't solve the above problem effectively. The main reason is the lack of a unified cross-domain resource manager, therefore, this paper propose the Virtual Job Model (VJM) which can manage heterogeneous resources and parallel jobs on the grid environment. This model is in the meta-schedule layer, it is able to co-allocate grid resources for the parallel jobs and avoid deadlock. VJM does not depend on the mechanism of resource reservation, so it can collaborate with existing major local scheduler via GRAM protocol, such as OpenPBS and SGE which do not support Resource Reservation. The Resource Selection Algorithm selects an optimized clusters set by computing the minimal waiting time of each job to minimize the whole resource co-allocation time, which can resolve the problem of long synchronization waiting time effetely. In addition, the Resource Reorganization Algorithm can also reduce the resource waste caused by deadlock and improve the utilization of resources.In VJM, the parallel jobs' resource request is not satisfied immediately, also the sub-jobs are not submitted immediately, but resolve them after the resource scheduling by VJM. Firstly, VJM will calculate the load conditions of all candidate clusters from VJobs' log file whenever a parallel job's resource request arrives, and calculate the VJob target cluster based on the formula of the Resource Selection Algorithm. Then VJM distributes VJobs based on the result of Resource Selection Algorithm. The main function of VJob is to occupy resource and feed back the information of the resource to the Virtual Job Control Center (VJC). VJC co-allocate resources based on the resources information synchronously, and distribute real jobs to the virtual jobs which have occupied resources finally.Because the grid resources are distributed in many administrative domains, on which the various local policies for external jobs are enforced. So these domains are seen as black box, and the detail load information generally unknown to the grid users and hard to measure which will make it difficult to give a reasonable resource co-allocation. Because the waiting time of the local jobs can represent the cluster load condition, we can calculate each cluster's average waiting time of external jobs which can be got from the VJobs' feedback information to measure the load. Another important evaluation argument is the number of involved clusters. When the parallel application is running, its sub jobs need to communicate between each other, the more the clusters involve in, the more cross-domain communication overheads cost. VJM resources VJob selection algorithm is based on the log information, as well as the number of the candidate cluster to measure the distribute cost of each VJob, thereby enhancing the efficiency of resource co-allocation.VJM can avoid the resource allocation deadlock by unified resource management. However, the concurrent submission of parallel jobs and the independence of the grid nodes also can cause deadlock between VJM or VJM and other resource co-allocator possibly. Therefore, we proposed a life cycle-based management mechanism to detect deadlock. That is, should not permanently hold resources, when a VJob get the maximum of its life cycle and the resource co-allocation is still not completed, we will consider that deadlock occurs between several parallel jobs. So we need to release resources to break the deadlock.However, when releasing resources, there may be other parallel jobs have not get enough resources, so that we do not need to release all the resources. Resources Reorganization Algorithm is the way of exchanging the resources belong to different parallel jobs in the same submitter. The exchange between the VJob needed to release resources and the VJob have not obtain recourse reduce the waste of releasing resources, and also makes the VJob submitted later obtain resources ahead of time , so the algorithm improve the utilization of resources to a certain extent.The implementation can be used independently; also can be a resources module as a part of meta-scheduler. VJC composes of the following components: RequestHandler, CerManager, VJobManager, VJobPool, VJobDeliver and VJobControllor. RequestHandler takes charge of resource request and the user control instruction, and then forward them to other related internal components. CerManager is in charge of the authentication of the resource user and resource requester. VJobManager is the core manager of VJob, Resource Selection Algorithm and Resource Reorganization Algorithms are implemented in this component. VJobPool as the container take charge of the storage for VJob information. A VJobProxy is information unit corresponds to a VJob, which contains resource information and VJob status. VJobDeliver is used to distribute VJobs to their target cluster through the GRAM protocol.VJobControllor is the component used for the information communicating such as control information or the VJob status information between VJC and VJob. This paper analyses the existing problems when a parallel job running a grid environment, and designs virtual job model (VJM) that will support a cross-domain allocation of resources synchronous. VJM is in scheduling layer, it manages grid parallel jobs and heterogeneous resources. Under the use of virtual jobs, VJM is able to run synchronous parallel jobs in the cross-domain, heterogeneous resources, and avoid the deadlock, and Resource Selection Algorithm and Resource Reorganization Algorithm reduce the resource waste and improve the utilization of resources. Moreover, VJM can work with almost all kinds of local schedulers via standard Grid Resource Allocation and Management (GRAM) protocol as it does not depend on resource reservation. We have validated the rationality of VJM by MPICH-G2, a parallel application.In the future research, we will study in depth of the parallel application of each scientific field, summing up the general features of this application, and develop VJM's schedule algorithm based on those features to improve application performance.
Keywords/Search Tags:Grid Computing, Virtual Job, Cross-Domain, Resource Co-allocation
PDF Full Text Request
Related items