Design And Implementation Of Four Redundancy Fault-tolerant Algorithm For Spaceborne GPU

Posted on:2019-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:W H Zhu

Full Text:PDF

GTID:2322330569495732

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The satellite-borne computer is to satellite what the human brain to the human body.It is a very important part of the satellite.It controls the operation of the satellite and the execution of tasks on the satellite.Therefore,once the satellite-borne computer fails,it may lead to failure of the mission,resulting in very serious consequences,even disasters.However,due to the harsh conditions of the space environment and the limitations of current software and hardware conditions,the security of the on-board computers needs to be effectively guaranteed.On the other hand,due to the wider application of satellites,satellite-borne computer hardware is required to have high performance,low power consumption,small size and light weight.The rapid development of GPU hardware and software technologies in recent years can solve this problem.The powerful computing capability of GPU can not only complete the calculation of large-scale intensive tasks,but also reduce the power consumption and cost compared with other aerospace-class chips.However,GPUs are more prone to transient failures due to increased chip integration and lower operating voltages.Therefore,when GPUs are used in aerospace applications where the reliability requirements are extremely high,suitable fault-tolerant technology needs to be used to implement the fault-tolerant design of the GPUs to improve its reliability and reduce the failure rate.This paper deeply studies and compares the applicable situations,advantages and disadvantages of various fault-tolerance methods,and focuses on the fault-tolerant technologies of hardware and software.In order to take into account both the high reliability of the system and the low design complexity,fault tolerance is designed using four redundancy.This paper selects NVIDIA Jetson TX2 with Linux as the operating system as the on-board GPU.Based on the hardware features and software technology of the GPU,the four redundancy fault-tolerant design is implemented from two aspects: CUDA and redundant process.The core idea of the four redundancy fault-tolerant design scheme based on CUDA is the redundancy calculation.It combines certain hardware and software fault-tolerant design concepts to make full use of the redundant resources in the hardware and implement four redundancy fault-tolerant at the kernel level,block level or algorithm design level.The redundant process fault-tolerant scheme has two parts: fault detection and fault recovery.Improving PLR method proposed by Shye et al.can realize fault detection,and the fault recovery can be achieved through checkpoint setting and recovery technology.Through the CUDA parallel computing platform introduced by NVIDIA,the experimental test and data analysis of part of the fault-tolerant scheme can be learned that GPU compared to the CPU can greatly reduce the time consume of the computing part through parallel computing,and the acceleration effect of this part is very significant.The performance of GPU fault-tolerant programs is mainly affected by factors such as the size of the computation,the time consume of data transfer between the CPU and the GPU,and the time required for the comparison of error detection and the like.Through the analysis of reliability,we can know that the four redundancy fault-tolerant scheme based on CUDA designed in this paper can greatly improve the reliability of the system and meet the reliability requirements of the on-board GPU.

Keywords/Search Tags:

GPU, CUDA, Four Redundancy, Fault Tolerant

PDF Full Text Request

Related items

1	Research On The Anti-SEU Technology Based On TMR-CUDA Fault Tolerant Architecture
2	Regional Jets, Digital Autopilot Research
3	Design On Fault-tolerant And Redundancy Of Electronic Speed Governer
4	Research On Control Strategy Of Aviation Fault-tolerant Generation System Based On Three-phase Four-Leg Convertor
5	The Research Of DSP Software Redundancy Fault- Tolerant Voting Method Based On Confidence
6	Research On Fault Tolerant And Reconfiguration Algorithm For Redundant Navigation System
7	Research On Fault Diagnosis And Fault-tolerant Control Of ESP Based On Analytical Redundancy
8	Investigation Of Director Torque Control System For A Fault-Tolerant Permanent Magnet Motor Drive With Redundancy
9	Research On Redundant PMSM And Fault-tolerant Control
10	Small Aircraft Engine Sensor Fault Tolerant Control Technology