Font Size: a A A

Research On Checkpoint-based Fault-tolerant Technology Based On Vxworks

Posted on:2015-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z S LiFull Text:PDF
GTID:2268330428997802Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Checkpoint is a very import method in fault-tolerant technology, it is widely usedin the distributed or cluster system. Within the message-passing system, checkpointstores the running state of process to a reliable storage device periodically (checkpointfile), so when a fault caused in a process, the process can get a quickly restored by thestored checkpoint, thereby reducing the amount of lost calculations running from thebeginning to failed position.Coordinated checkpointing, as a branch of checkpointing technology, keeps theglobal consistent state of the checkpoints by coordinating the setting of checkpointwithin the processes. Usually we evaluate the performance of a fault-toleranttechnology through fault-tolerant overhead. Fault-tolerant overhead includes overheadin fault-free overhead and fault-recovery overhead which separated by the failurepoint. Coordinated checkpointing get a better fault-free overhead with its globalconsistent state, but the coordination between processes increasing the fault-freeoverhead.We present a non-blocking coordinated checkpoint algorithm with O(n)complexity in this paper. It can prevent the inconsistent system state from messagereceived in processes with the help of global shared message channel. The algorithmtakes a single-phase commit protocol compared to the two-phase blocking commitprotocol in traditional block coordinated checkpointing, and reduces the complexityof coordinating message from traditional O(n2) to O(n), decreasing the fault-freeoverhead. By non-blocking manner, a process doesn’t need to stop its execution andmessage sending when taking a checkpoint; it can go on handling the subsequentmessage without waiting which improves the processing speed and real-timeperformance of the system. To meet the non-blocking property of the algorithm, processes save the checkpoint file to storage device independently, but the faultoccurred during the checkpoint-setting procedure will make the stored checkpoint fileinconsistent, so in this paper we take a dual checkpoint file mechanism to avoid suchan inconsistency.Non-blocking manner greatly improves the autonomy of process, however it turnthe global state of checkpoints from strong-consistent state to consistent state becauseof the possible in-transit message including. So we combine the log-based technologywith checkpointing to ensure the recoverability of the system. According to thecoordinated checkpoint algorithm, the log-based mechanism logs only the messagesafter the trigger point of the checkpoint-setting procedure, which avoids the necessaryfor garbage collection.The checkpoint-based fault-tolerant solution is based on VxWorks embeddedreal-time system which has good reliability and real-time property. In the solution, weimprove the efficiency of file storage and message transmission within this operatingsystem. It promotes the efficiency of file storage with tape-type storage solution, andlessens the quantity of information copy in the message queue to improve thetransmission efficiency through the memory management. Finally we take threesimple tests to verify the feasibility of the checkpoint-based fault-tolerant solution.
Keywords/Search Tags:Fault-tolerant technology, Non-blocking coordinated checkpoint, VxWorks, Tape-type storage
PDF Full Text Request
Related items