Font Size: a A A

Research And Implementation Of Anomaly Detection Algorithms Based On Log Data

Posted on:2022-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:X D WuFull Text:PDF
GTID:2492306317460144Subject:Weapons systems, and application engineering
Abstract/Summary:PDF Full Text Request
Anomaly detection is an important step for solve system failures.In recent years,the number of network users has increased dramatically,followed by constantly updated network attack methods,which puts forward higher requirements for anomaly detection technology.The commonly used anomaly detection methods,including code-based anomaly detection and social network-based anomaly detection,have certain limitations,so it is very important to select a suitable data source as the detection object.Log data contains a lot of information,which can reflect the various conditions of system operation.It is of great significance to analyze and detect abnormalities and response in time.The thesis aims to improve the effect of anomaly detection based on log data.Through in-depth study on dataset processing and machine learning methods,a relatively complete theoretical system of log data anomaly detection has been formed.Aiming at the log data showing the "3V" characteristics of big data,corresponding solutions was proposed to provide a valuable reference for the detection of a large number of log anomalies.According to the research status,problem discovery,problem analysis,experiment design and experiment verification,this paper mainly does the following work:1.The content of real network defense data set(CSE-CIC-IDS2018)was obtained,and the characteristics of the data set were analyzed.It was found that the length and number of forward packets had obvious visual characteristics,followed by the interval time between two packets.The data set is preprocessed,the feature importance is sorted,the features with high importance are selected,the feature cross and feature combination are performed,and the clean and regular dataset that can be used as the input of the classification model was obtained.2.Studied the different performance of a variety of commonly used classification models on the dataset,and compared the effects of each classification model in detecting DDoS attacks from different perspectives.Firstly,adjust the parameters of each model to obtain the optimal parameter model of each model.Secondly,a variety of classification models are used for training to obtain the classification effects of different models.Finally,the detection effect of each model was compared from different angles.It is concluded that the random forest model has the best detection effect,and the rnn model has the second best detection effect,both of them have excellent performance.3.An improved anomaly detection method was proposed.In this method,the original data set was divided into several sub-datasets,and the same classification model was used to train these sub-datasets.The best sub-dataset was selected as the representative of the dataset,and the effect of the improved method was compared with that of the original method.The experiment shows that the improved method can detect abnormal log data effectively,not only can improve the running speed of the algorithm,but also improve the accuracy of the algorithm to detect the abnormal data,and realize a better anomaly detection method,which provides a new research idea for the large-scale log data anomaly detection.4.Implemented anomaly detection algorithms in distributed environment.An experimental environment combining the Hadoop distributed platform and the parallel computing framework Spark was built.The SVM algorithm,in the MLlib of Spark,was used to anomaly detection.The results showed that the distributed environment has little effect on the accuracy,recall and F1,but the run time of the algorithm is significantly shorter than that of a single-machine.
Keywords/Search Tags:Anomaly detection, Log data, Machine learning, DDoS, Distributed
PDF Full Text Request
Related items