Effective Software--based Protection Technology Of Soft Error On Source Code Level

Posted on:2022-10-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:N Yang

Full Text:PDF

GTID:1488306740463714

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Soft errors are transient errors caused by Single Event Effects(SEEs)result from a strike that made by high-energy particles when they act on the sensitive areas of semiconductor de-vices.The SEEs change the logic state of the devices.Soft errors do not influence the inner construction of circuits,however,the data in registers or memory cells is influenced,impacting the reliability of software computation.Today,with the progress of the manufacture craft of integrated circuits,the integration level of a chip is getting higher and higher,and the supply voltage of a chip is getting lower and lower.These make soft error rate rising.Soft error protec-tion includes soft error detection and recovery.It is conducted at both hardware and software levels.Compare to hardware-based approaches,software-based approaches are low-cost and easy to design.They are studied and applied widely.The resulting forms of soft errors can be categorized as benign,crash,hang and SDC(Silent Data Corruption)errors.Being benign means that an error is masked during a program execution.The error does not make an influence on the result of the program and the output of the program is right.Crash and hang cause a program to respectively stop execution and to run non-stop.They can be captured by simple detection systems.When an SDC error occurs,the behavior of the program is the same as that of fault-free,however,the result of the program is wrong.Because the result of the program usually is not suspected,the consequence of the SDC error is severe.SDC is considered as the most dangerous and insidious type of soft error result.This dissertation focuses on SDC errors and carries out the study of soft error protection on the two aspects of the detection and recovery.Soft error detection includes determining the location and the form of detectors.When an error occurs in different instructions of a program,the probability that the result type of the program is SDC is different,namely the SDC vulnerability of instructions is different.Placing detectors on the locations where instructions have a high SDC vulnerability is beneficial for soft error detection.Therefore,it is significant to identify(predict)the SDC vulnerability of instructions.Current approaches of identifying the SDC vulnerability of instructions are high cost of fault injection.In addition,their adaptability is poor and the process of collecting the data used for the identification(prediction)is complex and difficult.An approach of identifying the SDC vulnerability of instructions based on partial fault injection is proposed for reducing the cost of fault injection,improving the adaptability and simplifying the process of collecting the data.After determining the location of detectors,the detectors are required to be determined.A logic-based invariant assertion approach takes an assertion as a detector to detect soft errors.It has achieved a good result.In order to improve the detection efficiency,namely to make a better trade-off between the detection overhead and detection ratio,assertions are screened.However,there still exists redundant assertions,impairing the detection efficiency.In addition,in the process of assertion screening,the benign detection ratio and detection degree are not considered.This incurs unnecessary recovery overhead and cannot focus on detecting severe SDCs.In order to further improve the detection efficiency and detection degree and reduce the benign detection ratio for a better trade-off between the detection overhead and detection ratio,a detection of severe SDCs and a reduction of unnecessary recovery overhead,a program-level assertion screening approach is proposed that screens redundant assertions in a novel way.When an error is detected,a recovery process is launched.A periodic checkpoint approach does not consider the deployment of detectors.This makes it cannot fully reduce the time overhead.A checkpointing recovery approach based on the location of detectors is proposed.It aims to reduce the time overhead by optimizing the deployment of checkpoints.The main contents of this dissertation are summarized as follows:(1)The SDC vulnerability of instructions is predicted by partial fault injection.The number of fault injections and the performance of the prediction are positive correlation.Reduc-ing the number of fault injections impairs the performance of the prediction.In order to maximize the reduction of fault injections while meeting the requirement of the perfor-mance of the prediction,partial fault injection is applied to control the downward slope of the performance of the prediction.Fault injection is applied to partial instructions of the target program,whose SDC vulnerability of instructions is required to be pre-dicted.After that,the result of fault injection is utilized to generate a training dataset and a CART(Classification and Regression Tree)is trained by the training dataset.Finally,the CART is used to predict the SDC vulnerability of the remaining instructions.Partial fault injection makes the downward slope small in the early stage,providing a support for maximizing the reduction of fault injections.The data used for prediction is collected from target programs which simplifies the collection process.Experimental results show that 45% fault injections are sufficient to predict the relative SDC vulnerability of an in-struction with respect to other instructions.The spearman's rank correlation coefficient is 0.81.The maximum differences of the spearman's rank correlation coefficient on dif-ferent programs and different program inputs respectively are only 0.10 and 0.025.The proposed approach outperforms e PVF and PVF approaches in the above aspects.(2)A program-level assertion screening approach is proposed.Assertions in the program hardened by logic-based invariant assertions are screened by two stages,screening asser-tions for program points and screening assertions for neighbouring program points.In the first stage,every program point is handled.At a program point,for every assertion,its benign detection ratio and detection degree are first determined.After then,its impor-tance is calculated by its benign detection ratio and detection degree.Finally,only the most important assertion remains at the current program point.In the second stage,the assertions remain after the first stage are first divided into disjoint assertion-pairs.After that,every assertion-pair is handled.For every assertion-pair,the redundancy degree of the former assertion with respect to the latter assertion is determined.The gain and loss of deleting the former assertion are calculated.They are used to evaluate the profit.The former assertion is deleted,as the redundancy degree exceeds a threshold and there is a profit of deleting the former assertion.Otherwise,it is not deleted.Experiments are con-ducted to evaluate the proposed approach.Experimental results show that in comparison with Radish approach,the detection efficiency of the proposed approach is about 2 times than that of Radish.In addition,the proposed approach reduces the benign detection ratio from 27.8% to 19.2%,and the percentage increase of detection degree is 10%.(3)A checkpointing recovery approach based on the location of detectors is proposed.It re-deploys the checkpoints of periodic checkpointing approach by considering the location of detectors.First,according to periodic checkpointing approach,checkpoints are initial deployed.They divide programs into multiple program segments.Then,the time over-head of each program segment is calculated according to the location of detectors.After that,every program segment is handled.For every program segment,the change of the time overhead when an additional checkpoint is inserted into it or the checkpoints into it is deleted is evaluated.When the time overhead decreases,additional checkpoints are inserted into the program segment or the checkpoints in the program segment are deleted.Experimental results show that in the case that a single event flip occurs and the error that caused by the flip invalidates a detector,in comparison with the periodic checkpointing approach with checkpoint interval are respectively T/4 and T/3(T is the original exe-cution time of the program),the percentage decreases of the overall program execution time are 15% and 11.4%,respectively.In the case that a single event flip occurs,in com-parison with the periodic checkpointing recovery approach with checkpoint interval are respectively T/4 and T/3,the percentage decreases of the overall program execution time are 16% and 11%,respectively.

Keywords/Search Tags:

soft error, fault tolerance, invariant assertion, software reliability, redundant asser-tion, checkpoint technology

PDF Full Text Request

Related items

1	The Research And Implementation Of Checkpoint Technology Based On WinNT
2	The Research On Soft Error Tolerance Technology Of Integrated Circuit In Nanometer Technologies
3	Low-cost assertion-based fault tolerance in hardware and software
4	Soft Error Sensitivity Analysis And Reliability Optimization Techniques For Digital Integrated Circuits
5	The Software Implemented Fault Tolerance Study Based On COTS DSP
6	Research On The Key Techniques Of Dynamic Soft Error Tolerance Design On High Performance Microprocessor
7	Research On The Fault-Tolerant Technology Of The Joint Servo Controller Based-FPGA
8	Program Oriented Soft Error Tolerance
9	Research And Implementation Of The Automatic Jobs Fault Tolerant Technology Based On Checkpoint
10	Research On Adaption Method Of Cloud Fault Tolerance Services Based On User Requirement And Resource Constriction