| The revolution of the information technology, which is started by the invention of computers, changes people's lives. Internet is one of these revolutionary results. The presence of Internet breaks the style of how people communicate, people never used to be so tightly connected.Computer network grows faster and faster. However, it also has to face many problems. Network faults are ones of them. Network faults are those abnormal things that happen in network and make the network hard to use. Faults make the network services down and a loss from it.As long as the growth of quantities of the network and more use of both hardware and software, the management and maintain of network become more and more complicated. Distribution of network makes it difficult to find out the faults and recovery even harder. These make management important. Management of network fault is a series of events that dynamically keep the network services work. These events include finding faults, searching the reason and correct faults with its controlling methods. Fault management works for the normal running of computer network, makes repair after fault happens.Faults have common properties. Spreading means one fault makes another equipment having faults, and those faults make new faults again. Depending on the time it lives, faults can be permanent, temporary or intermittent. Faults also can simply be software faults or hardware faults.The processing of network faults can be divided into three parts, the collecting of faults, the decoding of faults and the analysis of faults.The collecting of faults gets information from network equipment on which some fault occurs. Depending on the behavior of collector, collecting can be divided into three parts, positive, negative and mixed way. Positive way such as ping spies the running condition of network equipments. Negative way such as SNMP Trap waits for faults information from network agents. Mixed way uses them together.Different manufacturers sell different equipments; different equipments use different technology and create different faults. These private faults must be correctly decoded just as the common faults described in the protocol. Decoding of faults is just getting the real meaning of fault alerts through explaining the common defined faults as long as enterprise defined faults in the given protocol formats. Decoding of faults is tightly related with SNMP protocol.Analysis of fault alerts finds out the importance of alerts, kills fake alerts, combines repeated alerts and finds out the original fault. Analysis of fault alerts is composed of alert compressing, alert concussion erasing and correlation analyzing.This paper discusses the fault management based on SNMP, which is short for Simple Network Management Protocol. It's a protocol defined for network management. SNMP is composed of two parts, network manager and network equipment. There is a management process running on manager while an agent process on equipment. Agent sends faults, manager gets them. This protocol uses the popular Server/Client model.There are three parts in SNMP network management, SMI for common data structure and symbol, MIB for a store carries parameters that can be query and set, and the communication protocol between manager and agent.SNMP inform can be used to indicate wrong user authentication, restarting, connection closing, communication down or else. Trap is a manipulation which is sponsored by agent and report the fault of network joint to the manager. Trap uses little network resources, can be implemented simply, and make good use of collecting faults. Trap has its disadvantages; it's based on UDP which is not so trustful, so ping helps.This paper designs and implements a real fault management system based on SNMP. The system has fault collecting decoding reserving some analyzing functions.The system follows leveling and modeling strategy, it also thinks in Object-Orient ways. It is made up by three levels and also two sub-systems. Sub-systems are divided into many functional modules, communicate by interface calls, data sharing and network messaging.The whole system has three level, a network faults management center, some fault collecting probe in different network areas and network equipments. Network faults management center and fault collecting probe directly correspond to function modules, and finally implemented as software. These two sub-systems communicate in network packet. System requires equipments being managed implement the SNMP and react right to the ping command.The collection sub-system collects faults happens in network equipments. This sub-system contains a protocol module for implementing the SNMP, a fault collection module for creating sessions and receiving faults, a fault decoding module for decoding all the faults no matter it's a common fault or a specific one, a alert analyzing module for erasing useless alerts, and a reporting module which formats the faults and sends report to the fault management center sub-system. Fault management center sub-system receives faults, reserves faults and analyses faults. This sub-system has three modules. Fault receiving module gets formatted fault from collecting sub-system, and database module reserves fault message.This network fault management designed here treats all faults equally, common or specific; it is leveling designed for collecting and handling. This system has complete functions, ideal efficiency and is running stable.The system is coded in Java. It uses MySQL which is an open source database.Tests are written with JUnit. |