Font Size: a A A

Research Of Parallel String Matching Algorithm

Posted on:2015-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:M HouFull Text:PDF
GTID:2298330422490915Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The importance of network security is growing progressively with the continuousdevelopment of the Internet. Network content detection has become an integral part ofnetwork security system. The technology of network content detection faces challengesin processing large scale data and new application requirements, and string matchingalgorithms are the core technology of the network content detection, so it is veryimportant to maximize the performance of string matching algorithms. However, theperformance of serial string matching algorithms is difficult to meet the demand of thegrowing network, so using parallel approach instead of serial is one of key to solve theproblem. With the rise and development of the hardware, such as GPU (GraphicProcessing Unit), TCAM (Ternary Content Addressable Memory), FPGA (FieldProgrammable Gate Array) and Bloom filter, parallel algorithms based on hardware hasbecome the hot spot of research. At the same time, GPUs have attracted a lot ofattention in the field of high-performance parallel computing due to their cost-effectiveand high throughput.The design of parallel string matching algorithms based on sotfware and hardwarehas been introduced respectively in this paper, and the basic theory of hardwareimplementation has been searched deeply, including GPU, TCAM, FPGA and Bloomfilter. Considering the high-performance parallel computing architecture of GPUs, thispaper proposes PAC (Parallel Algorithm based on AC) algorithm that transplants theclassical AC (Aho-Corasick) algorithm to GPU platform. The preprocessing part ofPAC algorithm performs serially on the CPU while pattern matching part processesconcurrently on GPUs. The input data set is divided into multiple data blocks and eachthread of the GPU processes a data block to perform pattern matching. But the PACalgorithm has to do something called "boundary detection", and it will lead toperformance degradation.In order to eliminate the boundary detection problem of the data parallel algorithms,including PAC. A novel parallel string matching algorithm is proposed, which is namedPAT (Parallel Algorithm based on Trie), and the design of the algorithm is also theinnovation point of this paper. It is realized based on the Trie that all the failuretransitions and the self-loop transition of AC automata are all removed, which is calledPAT automata. Each character of the input stream is assigned a GPU thread to processpattern matching by searching the PAT automata. This paper focuses on the realizationof the PAT algorithm and the advantage of GPU implementation. Also, severaloptimization strategies are put forward on GPUs, including reducing the memorytransaction of global memory, eliminating the access of output table, and reducing the lookup latency of the state transition table.The experimental results show that PAC and PAT which are both performed onNVIDIA GPUs achieve up to4.75and8.43times faster than the AC algorithmrespectively. Through the acceleration technology of GPUs, the performance of stringmatching algorithms improves greatly, and the PAT algorithm based on GPUs achievesbetter performance than PAC algorithm.
Keywords/Search Tags:String Matching, Parallel Algorithm, GPU, AC Algorithm, Trie
PDF Full Text Request
Related items