| Routers or switches are indispensable components of large-scale networks which serve as infrastructure of cloud computing.They play crucial roles in large-scale network(e.g.,ISP networks,Datacenter networks)performance.Compromised or misconfigured routers sabotage packet delivery and hurt network performance,therefore,they have been a major concern in large-scale networks.Data-plane fault localization(FL)promises to solve this problem.Regrettably,the path-based FL method needs to know the outgoing path in advance,thus fails to support dynamic routing.In this dissertation,we propose two fault localization methods which support dynamic routing.The two methods can quickly detect misbehaving routers at the cost of negligible overhead,and they are secure against current network attacks in the data plane.The contribution of this dissertation is as follows:Firstly,we propose a centralized dynamic fault localization approach – DFL.It exploits the trustful administrative controller(AC)node to collect and monitor the traffic summaries of a node.More specifically,at the beginning of an epoch,the AC node sends the fingerprint function and the sampling function of the next epoch to each node.At the end of epoch,each node uses the traffic packets to generate the traffic summaries computed with the fingerprint function and the sampling function,then sends the traffic summaries to the AC with the reporting path generated by the minimum spanning tree.After the AC node collects the traffic summaries and detect malicious nodes.The AC node detects malicious nodes based on the “flow conservation”.This method needs less key management and does not require the source node to know the outgoing path.Secondly,we propose two algorithms for alleviating link load due to reporting traffic summaries.One algorithm named LevAdj is for network topologies to be hierarchical with strictly laying pattern such as datacenter network(DCN)topologies,and the other algorithm named BinDin is for general network topologies.Compared by minimum spanning tree,both algorithms reduce link load by about 10 times,and BinDin algorithm is a little better than LevAdj algorithm in bandwidth overhead.However,LevAdj algorithm computes 100 times faster than BinDin algorithm.Finally,we propose a distributed dynamic fault localization – D2 FL.It exploits the “flow conservation” the same as DFL,but the detection of a node is implemented by a random two-hop neighbor of the node rather than the AC node.More specifically,it divide the nodes into three roles,the collector,the suspect and the spotter.The collector collects the traffic summaries and make the consistence check;the suspect is the suspicious node to be detected;the spotter generates the traffic summaries and reports the traffic summaries to the collector.We simulate the bandwidth overhead of D2 FL,and find it only incurs about 2.5% bandwidth overhead.We also implement an open source prototype based on Click Modular Router in Linux platform.We test D2FL’s performance with real applications,and the experimental results show that it only imposes about 10% overhead.In addition to support dynamic routing,D2 FL saves much storage space and has no need of global clock synchronization in each node.Also,it is resilient to a single point of failure. |