Font Size: a A A

Job Fault Tolerance And Migration Of Seismic Data Processing System

Posted on:2015-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2180330473951770Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid growth of high performance computing, its structure is more and more complex, and the scale of resource is becoming more and more big, at the same time, a component failure probability in the whole system is increased dramatically. Therefore, by using computer related fault-tolerant technology, adding the corresponding fault tolerance for the system, and ensuring the high availability of the whole system, also appear more and more important. The existed seismic data processing system is very powerful. It supports the processing of complex operation, but fails to run against all kinds of failure operation situation, and only the simplest fault tolerance is achieved, so processing efficiency is low in practice; Although using the static scheduling to select the node for job, in the process of the actual job execution, uneven load of each node resources may still happen, which lowers the system’s efficiency, and increases the job run time.The purpose of this article is designing and implementing a subsystems which is fault-tolerant and supports job dynamic migration for the existed seismic data processing system. Through in-depth study of related computer system fault tolerant technology and process migration technology, add and improve the function on the basis of the existed seismic data processing system, and the main work includes:First, this paper designs and implements the job checkpoint fault-tolerant subsystem automatically. Through the deep understanding of the existing checkpoint fault-tolerant and recovery technology, in view of the centralized and distributed execution control system, design and implement user level checkpoint fault-tolerant system; As for the driving characteristics of single channel processing job execution process, design and implement the application-level checkpoint fault-tolerant system.Second, this paper designs and implements the dynamic job migration subsystem. In the process of job execution, by moving job process from high load node to the idle node to continue execute, ensure the balanced of the whole cluster data processing system. on the basis of the static scheduling, improve the efficiency of cluster system, and ensure that the system of load balancing.Third, this paper does some experimental tests. By testing find that in the job checkpoint fault-tolerant subsystem checkpoint and rollback recovery operations cost little time and small storage, which does not affect the operation of normal job execution, and effectively increases the availability of data processing software; Job migration subsystem by transferring job process between the different nodes, further reduces the amount of time to finish the job, and ensures the effective utilization of system resources.In conclusion, this paper’s job checkpoint system can effectively solve the problem of job failure due to the breakdown of nodes, and greatly improve the efficiency of system fault-tolerant processing, reduce repeat time of job execution; At the same time, job migration subsystem can effectively guarantee the dynamic load balancing of cluster, and further improve the execution efficiency of jobs.
Keywords/Search Tags:High-performance computing, Seismic data processing, Checkpoint technology, Job migration
PDF Full Text Request
Related items