Font Size: a A A

Research On Real-time Anomaly Detection Of Massive Log Streams Based On DME Cluster Analysis Model

Posted on:2018-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:G P WuFull Text:PDF
GTID:2348330515462873Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the importance of security issues in the network environment has become increasingly prominent,a robust application,the output in the key log to monitor the current memory footprint,CPU utilization and other service status,request and response information,system failure tracking Information.Therefore,log analysis is the most common and more effective method of anomaly detection.In this paper,we focus on the real-time anomaly detection of log flow and the effectiveness and performance of log flow anomaly detection and real-time computing.The specific research contents are as follows:(1)Log stream pre-processing.The original log information is redundant and disordered,and the text format is used as the input object of the anomaly detection module,which greatly reduces the efficiency and accuracy of the detection module.At the same time,the complexity of the clustering algorithm is increased.The lossless compression algorithm LLCA(Log-stream Lossless Compression algorithm)is used to process the log data.The relationship between information entropy and lossless compression is used to characterize the text string by information content,and the attributes of logs are digitized,The final realization of the log of numerical.(2)Log flow anomaly detection based on clustering analysis of DME(Dimension based Maximum Entropy clustering analysis algorithm).Clustering analysis is a new popular method for log flow anomaly detection.In order to realize real-time anomaly detection of massive log stream,a DME clustering algorithm is proposed in this paper.DME mainly solves and optimizes the traditional three-dimensional clustering algorithm based on density grid: 1)The problem of clustering quality instability.DME solves the problem of manual parameter setting by introducing information maximum entropy principle and related similarity method,and improves the stability of clustering;2)Reduces the computational complexity and space complexity.The dimensions of space division,the dimension of group connectivity,form dimensions of clusters,and ultimately the formation of micro cluster structure,to solve the traditional grid number and dimension exponential relationship problems,greatly reducing the number of grids,which reduces the clustering computation and space complexity.3)Optimization of anomaly detection performance.The concept ofdimension information entropy is introduced to amplify the amount of abnormal information and enhance the effectiveness of detection;On the basis of sliding window model,combined with Ebbinghaus,a new MCDW(main cluster damping attenuation window model)data stream processing model is proposed to optimize the storage of historical information and improve the quality of clustering.Finally,the validity and performance of the model are verified with the real data KDD CUP-99 in the UCI international standardized data set and the real ssh service log.(3)Real-time computing system framework to achieve.This paper designs and implements a DME-MLRADS(Real-time Anomaly Detection System of Massive Log-stream based on Dimension Maximum Entropy clustering algorithm)based on the Flink framework.The experimental results show that the system is accurate and effective,and the performance is compared with that based on Hadoop.
Keywords/Search Tags:log flow, cluster analysis, maximum entropy of information, Flink
PDF Full Text Request
Related items