Font Size: a A A

Software-based Techniques For Soft Error Detection

Posted on:2018-12-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C MaFull Text:PDF
GTID:1362330545464261Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Single Event Effects(SEEs)are caused by energetic particles in radiation environment.The transient errors caused by SEEs are called soft errors.Soft error has a great influence on computing reliability of space devices.With the increase in the number of transistors on a chip,soft error rate will grow with Moore's Law,exacerbating the challenge of computing reliability.Error detection is a crucial step toward soft error mitigation.Applying hardware-based detection methods is not able to reach the requirement of soft error mitigation.Due to the advantages on costs,independence and configurability,software-based detection methods are being paid more and more attention in soft error mitigation research.This dissertation focuses on software-based detection methods against soft errors.The purpose of this dissertation is to provide software-based detection methods which achieve high detection rates with low costs.After these methods are applied,the reliability of aerospace computing can be improved without requiring any changes to the hardware.Among the outcomes of soft error,silent data corruption(SDC)is hardest one to detect.When SDC occurs,the program generates erroneous output without any indications.This dissertation proposes the propagation theory and detection methods against SDC,then evaluate the designed detection methods by using fault injections.The main contributions of this dissertation are summarized as follows:(1)The effects of soft error on the stack behavior are analyzed and a few observations about SDC are concluded.The execution of a program is composed of a series of calls to procedures and calls are usually implemented by using stack.The processor provides three pointers for stack operations:the stack pointer,the stack-frame base pointer and the return address.The stack pointer is contained in the ESP register and the stack-frame base pointer is contained in the EBP register.ESP and EBP arc often used for addressing and the return address determines the control flow after RET instruction,thus ESP,EBP and return address are important to the correctness of the program.To the best of our knowledge,the stack behavior has not been characterized in prior work.A series of fault injection experiment arc conducted to characterize ESP,EBP and return address.Experimental results show that injections on ESP lead to SDC only if the flipped ESP points to another return address when executing the RET instruction.The injected bits of these SDC cases are distributed in the particular bits and the timing of injection impacts the results of injection.Hang cases of injections on RET-control EBP are caused by return cycle and the essential conditions for the occurrence of return cycle are obtained.(2)A novel method of identifying SDC-causing instruction by fault propagation analysis are proposed and the distribution characteristic and propagation characteristic of SDC-causing instruc-tion are obtained.An instruction is an SDC-causing instruction if an error in its operand can cause SDC.The design and improvement of SDC detectors often need a profile of SDC-causing instructions.According to the state of the art,a huge number of faults need to be injected to locate SDC-causing instruction,which incurs prohibitive time cost.Data dependence graph is built to capture the de-pendencies among the values of instructions.The inter-function and intra-function propagation that leads to SDC is analyzed and the sufficient condition of SDC-causing instructions is demonstrated.Further,a novel method of identifying SDC-causing instructions is proposed.Taking advantage of the trace files of injection,our method can detect underlying SDC-causing instructions without any expensive computations.Validation efforts show that our method yields high accuracy and coverage rate with a great reduction of injection cost.After analyzing the SDC-causing instructions,we find that only a small fraction of static instructions or sections of source codes cause most of SDC cases.Moreover,the critical program points of fault propagation refer to connector instructions and branch instructions.These conclusions guide the strategic placement of detectors.(3)An approach for detecting SDC is proposed by using program invariant,which is originated from software testing.A program invariant is a set of properties of program.Normally,the invariant holds during runtime.But when soft error occurs,the invariant is often violated due to the impact of soft error.Based on this principle,invariant-based assertions are inserted into source code.Once an exception is thrown by an assertion,it indicates that soft error is detected.By analyzing the propagation of the fault that leads to SDC,the locations where assertions are embedded are selected and then invariants are extracted.Some of the invariants are converted to assertions based on their permeability,which indicates the capabilities of detecting soft error.The proposed approach is evaluated by fault injection experiment which shows that it achieves high coverage with low overhead.The SDC detection rate of the proposed approach is 21%higher than FaultScreening with nearly the same cost.By applying this approach,the version 1.0 of program-hardening system called Radish is implemented.Radish enhances the resilience of the program to soft error by inserting invariant-based assertions to the source code.The proposed approach provides novel detectors which contain more types of relationships and achieve higher detection rate,broadening the ways of detecting SDC.(4)The assertions generated by Radish cannot fully monitor all the variables and program points;thus certain faults might propagate through unprotected code sections.To address this problem,software-based instruction duplication mechanism is introduced.Compared with asser-tions,software-based instruction duplication mechanism detect soft errors at a finer level of granu-larity,thus it can protect code sections that are not covered by assertions.Experiments show that adding software-based instruction duplication mechanism increases the SDC detection rate by 15.5%compared with pure assertions.By applying the software-based instruction duplication mechanism,the version 2.0 of program-hardening system called Radish_D is implemented.Radish_D extend-s Radish by adding a module which implements instruction duplication mechanism.Radish_D produces executable files with invariant-based assertions and instruction duplication mechanism.
Keywords/Search Tags:Single Event Upset, Soft Error, Software Fault Tolerance, Silent Data Corruption, Fault Propagation Analysis, Invariant Technology, Instruction Duplication Mechanism
PDF Full Text Request
Related items