Font Size: a A A

Network and CPU co-allocation in high throughput computing environments

Posted on:2002-01-03Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Basney, James AlanFull Text:PDF
GTID:1468390011490562Subject:Computer Science
Abstract/Summary:
A High Throughput Computing (HTC) environment delivers large amounts of computing capacity to its users over long periods of time by pooling available computing resources on the network. The HTC environment strives to provide useful computing services to its customers while respecting the various resource usage policies set by the many different owners and administrators of the computing resources. This requires a flexible scheduling mechanism, to match jobs with compatible computing resources, according to the job's needs and the attributes and policies of the available resources. It also requires mechanisms to help jobs to be more agile, so they can successfully compute on the resources currently available to them. A checkpoint or migration facility enables longrunning jobs to compute productively on non-dedicated resources. The work the job performs with each allocation is saved in a checkpoint, so the job's state can be transferred to a new execution site where it can continue the computation. A remote data access facility enables jobs to compute on resources that are not co-located with their data. Remote data access might involve transferring the job's data across a local area supercomputer network or a wide area network. These checkpoint and data transfers can generate significant network load.; The HTC environment must manage network resources carefully to use computational resources efficiently while honoring administrative policies. This dissertation explores the network requirements of batch jobs and presents mechanisms for managing network resources to implement administrative policies and improve job goodput. Goodput represents the job's forward progress and can differ from the job's allocated CPU time because of network overheads (when the job blocks on network I/O) and checkpoint rollback (when the job must “roll back” to a previous checkpoint). The primary contribution of this work is the definition and implementation of a network and CPU co-allocation framework for HTC environments. Making the network an allocated resource enables the system to implement administrative network policies and to improve job goodput via network admission control and scheduling.
Keywords/Search Tags:Network, Computing, Environment, CPU, HTC, Policies, Job, Resources
Related items