| Nowadays,with the rapid development of the Internet and the repeated appearance of new network businesses,network traffic shows exponential growth as well.The fine recognition of network traffic,becoming a powerful complement to firewall and other security technologies,is widely used in planning and management of the network,solving network congestion,preventing network attacks and so on.The emergence of high speed network has a higher demand on the traffic recognition technology.The ability of distributed computing framework in large scale data processing makes it able to deal with high-speed network traffic better,thus ensuring the smoothness of the network environment.Therefore,the application of distributed computing framework in network traffic recognition has become a new research hot spot.In this thesis,the theory of network traffic recognition technology is described in detail.It makes a deep analysis on the most common network traffic recognition technologies,namely,port recognition technology,DFI technology and DPI technology.By analyzing the requirements of network traffic recognition,this thesis mainly studies the KMP algorithm,BM algorithm,WM algorithm and AC algorithm in DPI technology.It makes a comparative study on the principle of various algorithms and the calculation process of algorithms,and proposes an improved pattern matching algorithm,called BMF algorithm,which is able to match the text strings more quickly.With the rapid development of the Internet,it is hard for the traditional network structure to adapt to the needs of new network business nowadays,and it’s also hard for the storage and computing of traditional relational data to adapt to the increasing demands of massive traffic in the future.Therefore,it is an inevitable developing trend to apply distributed computing framework to deal with the recognition of large-scale data flow.According to the characteristics of Hadoop cloud computing platform,this thesis designs the operation process of MapReduceBoyer-MooreFast algorithm based on DPI technology and MapReduce module,applies DPI technology to Hadoop cloud computing platform,and finally builds a Hadoop experimental cluster to grab the data to make comparative experiments.The experimental results show that this method can effectively recognize the network traffic.The main work of this thesis runs as follows:(1)We propose an improved pattern matching algorithm,called BMF algorithm.The BM algorithm uses the good suffix rule and bad character rule to construct two jump tables,in order to indicate the distance the character moves to the right.On this basis,the thesis optimizes and improves the matching idea of the algorithm,abandons the good suffix rule and the construction of the data lists in good suffix rule,in order to simplify the calculation process of the algorithmand reduce the space complexity.We focus on the use of bad character rule to improve the way of character matching,and increase the maximum distance the text string moves to the right and reduce the number of times the text string moves to the right.The experimental results show that the BMF algorithm improves the operation efficiency of pattern matching algorithm to some extent,without reducing matching accuracy.(2)We design the traffic recognition scheme of DPI technology based on Hadoop platform.Firstly,we use a software of capturing packet,called Wireshark,to capture the network traffic and extract packet characteristics of the traffic.Then,we take advantages of Hadoop platform handling large-scale data traffic,combine the DPI technology with MapReduce programming framework,and design the operation process of MapReduce Boyer-MooreFast algorithm according to its frame characteristics.Finally,we build the relative experimental environment and achieve traffic recognition based on DPI technology under Hadoop cloud computing platform.The experimental results show that,under the Hadoop platform,the DPI technology not only improves the efficiency of traffic recognition,but also ensures the accuracy of the recognition. |