Fault-tolerant is the key technology to keep system running with high reliability and high availability, and this thesis lays its focus on how to improve the whole system reliability by properly designing a fault-tolerance mechanism according to the characteristic of CMT system. Supported by an important provincial industrial project, Compound Mobile Telecommunication System, this thesis designs and implements a CMT system fault-tolerance mechanism under all IP environment to meet the high demand of whole system reliability. The main contents are as follows:1) An improved dual hot-redundancy system with dynamic fault tolerance mechanism is proposed to provide timely fault detection and non-stop service in run-time IP-connected CMT system, which accomplishes the dual hot-redundancy software-tolerance framework and optimization is achieved.2) A CMT multi-layer fault management mechanism is designed according to the characteristic of fault diversity. It integrates single node fault tolerance with inter-node fault tolerance, which applies a blend method including self-detection of processors and mutual-detection mechanism of primary/backup to equipment module, and proposes a ring-check dynamic algorithm for system level fault management with the characteristic of reconfiguration and service migration.3) An optimization scheme is proposed to handle with high system cost while switching in distributed multi-processor. It adopts Checkpointing and Rollback Recovery Software Fault Tolerance technology to avoid switch message storm in distributed multi-processor at the cost of a few time.4) A test is conducted on the practicability of fault tolerance scheme. It tests the steady availability through simulation of fault injection to analyze the reliability of system fault tolerance performance. The design of fault-tolerant mechanism in this thesis has been realized with programme, and it works well in CMT system. |