Font Size: a A A

Research On Out-of-distribution Anomaly Detection Technology For Network Traffic

Posted on:2023-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:J Z CheFull Text:PDF
GTID:2558306839995199Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the continuous popularization of artificial intelligence,its application gradually covers all aspects.Machine learning is one of the most effective ways to achieve artificial intelligence.Machine learning is the use of algorithms to parse data,learn from it,and make decisions and predictions about real-world events.Deep Learning(DL)is a new research direction in the field of machine Learning.Deep Learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed features of data,and has made a lot of achievements in search technology,data mining and other related fields.Deep learning has also found widespread applications in science and engineering,such as bioinformatics,healthcare,and cyber security.Deep learning is used to make important decisions in these areas.However,in practical engineering,it is not guaranteed that all the input data are known.According to studies,deep learning will incorrectly classify some unknown categories into known categories with high confidence,which are also known as OOD(Out-of-Distribution)or anomalies.In some areas these missteps can lead to serious consequences.Therefore,how to detect unknown category data in test samples is very necessary.This paper mainly studies the out-of-distribution anomaly detection technology for network traffic.According to the characteristics of out-of-distribution data,a detection method based on calculated likelihood ratio and a detection method based on Mahalanobis distance are proposed respectively.Firstly,the reliability of classification is improved by training two models based on the detection method of calculated likelihood ratio.The original model is trained by using in-distribution data,and the OOD data is simulated by adding noise to in-distribution data and the disturbed model is trained by adding noise to in-distribution data.Then,the likelihood ratio of the two models is calculated to determine whether the samples belong to OOD data.Because the method based on calculated likelihood ratio relies too much on hyperparameters,a further detection method based on calculated Mahalanobis distance is proposed to determine whether the test sample contains out-of-distribution data by calculating the Mahalanobis distance scores of the original sample and the test sample.In order to test the two test methods proposed in this paper,BP neural network,convolutional neural network,naive Bayes,decision tree,support vector machine and other machine learning algorithms are used to train the original model and disturbance model respectively by using the open flow data set and the collected data.Experimental results show that in the first method,the support vector machine algorithm has the best performance,the recognition accuracy can reach 92.1%,the accuracy of the second method can also reach 95%.Finally,this paper designs an original model update system,which can identify the out-of-distribution data types in the test samples and add them to the original model to improve the robustness of the original model.
Keywords/Search Tags:cyber security, out-of-distribution data, machine learning, likelihood ratio, Mahalanobis distance
PDF Full Text Request
Related items