Font Size: a A A

Research On The Key Techniques Of Dynamic Soft Error Tolerance Design On High Performance Microprocessor

Posted on:2013-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:1268330392973887Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of integrated circuit, modern microprocessors are becomingmore and more susceptible to radiation-induce soft errors, which emerge as a criticalchallenge for microprocessor design. Architectural soft error tolerance design improvesthe reliability of microprocessors efficiently and is greatly helpful to achieve the optimalfault-tolerance design choice, especially at the early design stage. However, traditionalarchitectural soft error tolerance designs often employ redundancy-based techniques,thus inducing additional performance and power overheads. Dynamic fault-tolerancemanagement mechanism, with which system resources can be tuned at runtimeaccording to the dynamic characteristics of soft error vulnerability, provides a potentialsolution for cost-efficient soft error tolerance design.This thesis details our researches on the key techniques of dynamic soft errortolerance design on modern microprocessor. Firstly, our researches focus on the softerror vulnerability estimating model and method, so as to establish an accurate andefficient estimation framework for reliability analysis. An accurate estimation of softerror vulnerability is the foundation of fault-tolerance design and optimization.Secondly, we investigate different reliability-oriented phase characterization techniquesand phase classification algorithms to exploit and utilize the dynamic characteristic ofsoft error vulnerability. Analysis of the phase behavior helps to reduce the simulationtime of programs and to optimize the design based on the phase characteristics. Thirdly,we study the reliability-oriented predictive methods, so as to create an accuratepredictive model for soft error vulnerability. Predicting the reliability accurately is thekey technique for dynamic fault-tolerance design. With the ability to predict soft errorvulnerability online, the fault-tolerance protection level can be adjusted dynamically,thus avoiding the situation of “no-protection” or “over-protection”. Finally, we researchthe dynamic soft error tolerance technique based on accurate soft error vulnerabilityprediction, so as to providing a cost-efficient protection for microprocessors.The primary innovative works in this thesis are as follows.1. An accurate and general soft error vulnerability estimation framework isdeveloped. We improve the soft error analysis model and integrate the model into ageneral simulator, thus developing an improved architectural level soft error reliabilityanalysis framework. This framework can be used to estimate the reliability of variouson-chip structures and to guide the fault-tolerance design choices. The soft errorvulnerability estimation framework is the foundation of dynamic soft error tolerancedesign.2. The optimal combination of phase characterization technique and phaseclassification algorithm for reliability-oriented phase identification is proposed. We evaluate the accuracy of basic block profile and performance metric information basedphase characterization techniques in capturing reliability-oriented phase behavior, andthen investigate the effectiveness of k-means clustering and regression tree algorithmsin reliability-oriented phase classification. We find that using the combination ofperformance metric information and regression tree algorithm achieves the optimalphase identification for soft error vulnerability. Exploiting the reliability-oriented phasecharacteristics helps to exploit the inherent relationship between soft error vulnerabilityand performance metrics, and provides theory support for dynamic soft error tolerancedesign.3. A Bayesian Additive Regression Trees (BART) based predictive model foraccurate soft error vulnerability estimation is created. We conduct a comprehensivecomparison among SLR (Simple Linear Regression)、BRT (Boosted Regression Trees)and BART, so as to quantitatively validate the superiority and robustness of BARTmethod in soft error vulnerability prediction. An accurate soft error vulnerabilitypredictor facilitates the implementation of the dynamic fault-tolerance technique.4. A prediction-based dynamic soft error tolerance scheme is proposed, and areliability evaluation metric which takes energy-efficiency into account is also proposed.We employ bump hunting technique to obtain a simplified and fast estimation of softerror vulnerability, thus enabling a feasible online monitoring of the reliability andfacilitating the implementation of dynamic soft error tolerance scheme. Compared withtraditional soft error tolerance techniques, dynamic soft error tolerance techniques couldachieve the reliability goal with minimum costs. We also use the reliableenergy-efficiency metric to evaluate several different fault-tolerance techniques.This thesis explores the dynamic soft error tolerance design on modernmicroprocessor by researching the soft error estimation model, phase-basedcharacteristics, predictive mechanism and fault-tolerance techniques. The experimentalresults provide a potential solution for cost-efficient soft error tolerance design on highperformance microprocessors.
Keywords/Search Tags:Soft Error Vulnerability, Dynamic Soft Error Tolerance, Reliability Estimation, Phase Identification, Predictive Model, Cost-EfficientDesign, Reliable Energy-Efficiency
PDF Full Text Request
Related items