| System log plays an important role in the network information security infrastructure.It can faithfully record all the system states and behaviors of the information system.Logs generated in a large distributed system can,for the most part,be viewed as streaming data reached on the move.Compared with the traditional data,this kind of log data has the characteristics of dynamic,disordered,unlimited,sudden and large volume.Large volumes of log data are flooding in,making it almost impossible to store them completely;The data has a time attribute with timestamp information representing the moment at which the event occurred.Concept drift occurs as the data progresses over time: that is,the current log has changed in some way compared to the previous log.Because the log carries the very important responsibility of network security,the system log anomaly detection method under the concept drift has the very important significance.This paper proposes two methods to adapt to concept drift in system logs.Consensus prediction algorithm is introduced as a method to adapt to concept drift,and statistical learning and machine learning are combined to effectively improve the system log anomaly detection effect.In this paper,an online anomaly detection model combining confidence is proposed for the dynamic logs of online systems.Logical regression is used as the consistency measurement module to provide the consistency score.Firstly,log blocks simulated in time sequence are received in the form of sliding window,and p values are obtained by preprocessing and pre-training.Then the confidence was calculated by using the conformance prediction score,and the anomaly logs were filtered by significance level.Finally,the relationship between the current and previous logs is established by confidence degree and the calibration set is updated dynamically.Therefore,the model can be applied in the online environment and the logging anomaly detection can be accomplished quickly and accurately.The second solution is the system log anomaly detection model based on Log Cluster.In the training stage,the model used the logging clustering method Log Cluster as the consistency measurement module,and combined with Bootstrap sampling method to overcome the problem of excessive memory consumption in the offline state of the original method.In the detection stage,the p value of each cluster is calculated and the maximum p value is obtained.The log classification is judged and recognized according to the maximum fault-tolerant probability,that is,the significance level.The consistency prediction is completed and the results are output.Compared with the traditional unsupervised clustering,the system log anomaly detection model based on Log Cluster can adapt to the concept drift in the log,so as to obtain better system log anomaly detection results. |