| Recently,with the rapid development of cloud computing,it is challenging to support the exponential traffic growth through traditional data center.Therefore,software-defined data center(SDDC)based on virtualization and software-defined networking(SDN)has attracted interest from both industry and academic communities,due to its’ advantages in elastic computing and resource pooling.However,with increasing complexity,detection and recovery technology to data center faults has become more challenging.This thesis focuses on the faults in SDDC,to develop fault detection and recovery approaches.Firstly,the fixed-period heartbeat detection mechanism for SDN switch has proved with some drawbacks in practice,such as poor flexibility and difficulty in detection period setup.For this reason,a self-adaptive switch failure detection algorithm is designed,which can dynamically adjust the detection period based on real-time network load and detection response time.At the same time in order to ensure the accuracy of detection results,a secondary detection scheme is used to deal with detection failure caused by network packet loss and reply timeout.Secondly,an OpenFlow based network fault classifying detection and reroute recovery mechanism is proposed.Since a single fault often triggers multiple detection failure events,the scheme will filter and merge the detection results before executing the recovery mechanism to provide accurate fault information for recovery module.Meanwhile,a real-time link load based reroute recovery mechanism is used in this scheme to keep the balance of network loads after recovery.Thirdly,by using SDN’s easy programming characteristic,a load balancing cross-layer recovery mechanism based on network edge detection is designed and implemented.The scheme updates the configuration of load balancing service based on SDN controller’s detection results at the network layer,which aims to improve the existing problems in two aspects.(1)Reroute technology is unable to deal with fault at the edge of network,such as fault between server and edge switch.Edge fault may make a significant impact on user experience,since many virtual machines wound be affected especially in high rate virtualization scenarios.(2)A traditional way is to provide service by load balancing cluster with health check to achieve high availability.But there are some issues in widely deployment of load balancer’s health check function,such as network utilization reduction and extra loads to computing and networking devices,which are caused by huge numbers of health check requests.Finally,to achieve the detection and recovery scheme mentioned above,a SDDC fault monitoring and recovery system is developed by combining SDN controller,cloud operating system OpenStack and recovery controller.Through effectiveness and performance tests in production environment,the result shows that this system is capable of improving the reliability of data center,enhancing user experience and reducing the operation cost. |