Font Size: a A A

Research On The Key Issues Of Traffic Measurement In High-Speed Networks

Posted on:2016-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:A P ZhouFull Text:PDF
GTID:1318330482475114Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Network traffic measurement is the foundation of understanding network operation and network behavior. As bandwidth is increasing quickly and internet is popular, we will face the new challenge of network traffic measurement. Because the contradiction exists between massive network traffic and limited system resource, the traditional traffic measurement algorithms have difficulty satifying the application requirements in high speed networks. Recently, the multicore technology has become the inevitable trend of current processor architecture development. In addition, as the cloud computing technology is extended, the cloud computing platform has the powerful paralle distributed processing capacity for large amounts of network traffic. Consequently, the parallel distributed design based on multicore technology and cloud computing platform becomes an effective approach, which improves network traffic measurement algorithms. Although network traffic measurement algorithms have been apllied to network security, network accounting, traffic engineering and so on widely, there are still many key problems in network traffic measurement to be researched and solved.This dissertation focuses on traffic burstiness, presents the related models and traffic measurement algorithms, and solves the key issues confronted by traffic measurement in high-speed networks, in order that they can provide powerful support for network operation and network management. The peak traffic metric is put forward in light of the burstiness of network traffic,the network behavior is analyzed and the appropriate capacity planning model is built to evaluate the access bandwidth of the new built campus network accuately. According to the heavy-tailed feature of network traffic and load imbalance in MapReduce algorithms, the heavy hitter identification algorithm based on adaptive sampling with MapReduce is proposed. The experiments validate the effectiveness of the above models and algorithms. Because the existing method based on flow sampling has heavy computing load, low detection accuracy and poor timeliness, the parallel data streaming algorithm of super-spreader detection is presented. The long duration flow detection algorithms based on shared data structure and independent data structure are designed to satisfy the application requirements of long duration flow detection in high-speed networks, and the latter can meet the application requirements in high-speed networks better. The main work and innovation of this dissertation are as follows:(1) The peak traffic metric is put forward in light of the burstiness of network traffic, and the capacity planning model is built to evaluate the access bandwidth of the new built campus network accuately. Firstly, the hypothesis testing and test of goodness of fit showes that the peak traffic obeys asymptotic Gaussian distribution and the autocorrelation analysis indicates that the peak traffic each other are independent. Secondly, on the one hand, the relation between access bandwidth and peak traffic is researched by the analysis of variance model, which is constructed by statistical mehods, and the analysis showes that the access bandwidth has small influence on the peak traffic; on the other hand, the relation among access bandwidth, nework user number, and peak traffic is researched by the analysis of covariance model, which is constructed by statistical methods, and the analysis showes that there is high correlation between access bandwidth and network user number and network user number is the main factor that affects the peak traffic. Finally, on the basis of the above analysis, the linear regression model and the capacity planning model is constructed. The experiment validates the effectiveness of the capacity planning model.(2) According to the heavy-tailed feature of network traffic and load imbalance in MapReduce algorithms, the heavy hitter identification algorithm based on adaptive sampling with MapReduce is proposed. Because groups are assigned to each reducer by Hash function:if groups obey uniform distribution, each reducer obtains the same amount of task, so load balancing exists among reducers; if groups obey skewed distribution, each reducer obtains the different amount of task, so load imbalance exists among reducers. Besides, the accurate flow size distribution is achieved by adaptive samping to reduce the required computing and storage resource greatly. One MapReduce job gains the original flow size distribution by adaptive sampling progress, on the basis of which the data partition scheme is made; the other MapReduce job implements heavy hitter identification by the above data partition scheme. The theory analysis illustrates that the flow size distribution obtained by adaptive sampling is unbiased, and the relative error of the flow size distribution is controlled by configuring parameters. The experimental results show that the proposed algorithm improves performance of heavy hitter identification and realizes load balancing among reducers, which is compared with the default data partition method based on Hash function and TopCluster.(3) Because the existing method based on flow sampling has heavy computing load, low detection accuracy and poor timeliness, the parallel data streaming algorithm of super-spreader detection is presented. With the development of multicore processor, the concurrent design becomes an effective way to improve algorithm performance. Firstly, the local sketch data structure is created for each thread. When every packet arrives, the corresponding bits of Hash values, which are achieved by Hash functions, are set to one. After measurement period ends, all the local Sketch data structure is combined. Secondly, the connection degree of nodes is estimated and the super columns are determined. Finally, the theorem 5.1 is used to reconstruct IP addresses of nodesis calculating any combination of two columns in the sketch data structure, and connection degree of nodes is estimated. If the connection degree of nodes is more than the threshold, the nodes are identified as super-spreaders. The above process is repeated until all the combinations of super columns are handled. The performance analysis and experimental results indicate that the algorithm has good detection accuracy and low overhead.(4) In order to satisfy the application requirements of long duration flow detection in high-speed networks, the parallel methods of long duration flow detection are designed from shared data structure and independent data structure on multicore hardware platform. Because different threads share data structure (Cuckoo Hash table) in the long duration flow detection algorithm based on shared data structure, in which the number of read operation is far more than write operation, ReadWriteLock is introduced to realize the synchronization among threads. Because of the great synchronization among threads, the long duration flow detection algorithm based on shared data structure can’t meet the application requirements in high-speed networks. According to the above deficiency, the local data structure is constructed for different threads in the long duration flow detection algorithm based on independent data structure, in which there is not synchronization among threads, so less overhead is generated. The performance analysis illustrates that the long duration flow detection algorithm based on independent data structure has low time and space complexity. The experimental results indicate that the long duration flow detection algorithm based on independent data structure has good time efficiency and it has better detection accuracy and flow duration estimate compared with the relative algorithms.
Keywords/Search Tags:high-speed network, traffic measurement, peak traffic, heavy hitter identification, super-spreader detection, long duration flow
PDF Full Text Request
Related items