| With the development of computer and Internet, Grid has evolved from pure high performance computing system into a system to coordinate distributed, dynamic and heterogeneous resources. However, fault-tolerant issue for Grid system is becoming a tough problem due to the inherent characteristic of Grid. In this paper, we propose an adaptive task-level fault-tolerant approach for Grid. The proposed approach is based on the classic four fault-tolerant approaches: retry, alternate resource, checkpoint and replication. We implement a prototype of the proposed approach in CGSP (ChinaGrid Support Platform). Simulation and experiments based on a real Bioinformatics Grid has proved that the adaptive fault-tolerant approach is better than existing approaches.Main contributions of this paper are:â—Propose a new adaptive fault-tolerant approach. This approach gains better performance than existing ones both in terms of metrics mean execution time and resource consumption.â—Corresponding model of mean execution time has been constructed based on probability method.â—Propose the metric resource consumption as another important metric to evaluate any fault-tolerant approach.â—Construct mathematical models for existing approaches and our adaptive approach based on resource consumption.â—Simulation has been made, and simulation results show that our approach exceed all other ones in terms of both metrics mean execution time and resource consumption.â—Experiments based on a real Bioinformatics Grid has been made, and experiment result again proves that our adaptive approach is best. |