Font Size: a A A

Superspreader Detection And Node Similarity Estimation In High Speed Graph Stream

Posted on:2022-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:L WenFull Text:PDF
GTID:2480306740994809Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As graph describe the complex relationship between entities,it is widely used in network anomaly detection,recommendation system,social network analysis,road traffic prediction and other fields.The traditional batch processing method needs to load the whole data structure into memory,which is obviously not suitable for the growing large-scale graph.However,streaming processing only needs limited memory to process big data online,so the research on graph streams is of great significance.The superspreader in graph stream refers to the node whose degree is larger than a threshold.The superspreader in graph usually has special meanings.By estimating the nodes Jaccard similarity in graph stream,it can be used to detect the similarity of network node connection state and analyze the homology of network attack hosts.Therefore,superspreader detection and node similarity estimation in graph stream have important research value for network security.However,existing superspreader detection algorithms occupy too much memory to achieve high estimation accuracy,which is not conducive to the deployment on devices with limited memory.Existing Jaccard similarity estimation algorithms are mainly used to estimate the similarity of static sets and data stream containing element addition,but they fail to deal with element deletion.To solve the above problems,we utilize the improved Hyper LogLog algorithm based on memory sharing to detect the superspreader in the graph stream,and reduces the memory consumption on the premise of ensuring the estimation accuracy.We also propose a method to estimate the Jaccard similarity of nodes in fully dynamic graph streams,which solves the problem of element deletion and improves the estimation accuracy.The main contributions of this paper are as follow:(1)Based on the research of cardinality estimation algorithm and superspreader detection algorithm,we propose the virtual Hyper Log Log(VHLL)algorithm based on the idea of register sharing.VHLL greatly reduces the memory consumption while ensuring the accuracy of superspreader detection.In the update phase,all elements are randomly hashed into a register array.In the estimation phase,a Hyper Log Log estimator is recovered from the common register array to estimate the number of connections of all nodes and find the superspreader that exceed the threshold.In this paper,the estimation bias and variance of VHLL are calculated by theoretical analysis.(2)We study the existing Jaccard similarity estimation algorithms,and analyzes the advantages and disadvantages of the relevant work.We propose an improved Odd Sketch algorithm,multiresolution Odd Sketch(MROS).MROS hashes the graph stream element to the corresponding bit array.If the deleted element appears,the corresponding bit of the element is flipped to realize the deletion of the element.It is the idea of hierarchical sampling of MROS that improves the overall accuracy of node Jaccard similarity estimation.We also prove that MROS estimation is unbiased estimation and show the upper bound of estimation variance through theoretical analysis.(3)We implement the above improved algorithms and compare with the existing superspreader detection and Jaccard similarity estimation algorithms.The advantages of our algorithm are verified through above experiments.We also build a experiment platform for data plane programming based on the idea of software defined network,and the virtual network topology is constructed through mininet.
Keywords/Search Tags:Graph Stream, Sketch, Superspreader Detection, Jaccard Similarity, Network Security
PDF Full Text Request
Related items