| With the advent of the era of big data,Industrial Internet of Things systems have been developed rapidly.However,the data in the Industrial Internet of Things system is very heterogeneous in terms of type and structure,and has the characteristics of large-scale,high-dimensional and multi-level.The existence of irrelevant features of large-scale and high-dimensional data can mask the existence of anomalies,and hidden anomalies are difficult to be intercepted by the detection system,so anomaly detection is a "dimensional disaster" problem.Therefore,to ensure the security of Industrial Internet of Things systems and to find a fast and effective anomaly detection method is a very urgent and meaningful task to facilitate the rapid development of Industrial Internet of Things systems.Existing anomaly detection methods for Industrial Internet of Things have three main problems.Firstly,there is the problem of distance-preserving loss in the construction of anomaly detection models;secondly,the huge amount of high-dimensional data makes it difficult to train anomaly detection models,and the process is cumbersome and complex,requiring a lot of memory space;thirdly,the anomaly detection model does not validate the detection results,leading to certain false positives and false negatives,and hidden anomalies are difficult to be effectively detected.To address the above problems,this paper analyses the advantages and disadvantages of current anomaly detection methods,and proposes an anomaly detection algorithm based on a cascaded hierarchical bloom filter,which combines a locality-sensitive hashing algorithm based on Gaussian random sampling,a bit confirmation scheme and an active detection scheme to perform fast and effective anomaly detection.Finally,the anomaly detection experiments are compared with the current mainstream anomaly detection algorithms on four different simulated datasets of Industrial Internet of Things systems.The experiments demonstrate that the detection rate of the method in this paper is better than other anomaly detection algorithms on all four different large-scale high-dimensional datasets,and the false detection rates of the algorithms are all lower than 10%.The main contributions of this paper are as follows.(1)To address the problem of distance-preserving loss in anomaly detection algorithms,a locality-sensitive hashing algorithm based on Gaussian random sampling is proposed to enhance the distance-preserving nature of the mapping,projecting data from high-dimensional space into low-dimensional space with minimal distance loss,and enabling fast projection transformation of the data.(2)Aiming at the characteristics of large-scale,high-dimensional and multi-level data in Industrial Internet of Things systems,an anomaly detection algorithm based on Gaussian random sampling of locality-sensitive hashing bloom filters is proposed by combining the characteristics of bloom filter compressed storage,locality-sensitive hashing and other related theories,which can effectively perform anomaly detection of large-scale and highdimensional data between the physical layer and the internal network,largely reducing the loss of memory space;the method adopts a semi-supervised training mode,and only normal data are required for training in the process of training the anomaly detection model.(3)For the no verification of detection results against anomaly detection models,leading to certain false positives and false negatives,and hidden anomalies are difficult to be effectively detected.Based on Gaussian random sampling locality-sensitive hashing bloom filter,a novel anomaly detection algorithm based on cascaded hierarchical bloom filter combined with bit confirmation scheme and active detection scheme can further determine the detection results,reduce the false positives and false negatives of the algorithm,and effectively intercept the hidden anomalies. |