Font Size: a A A

Research On Network Anomaly Traffic Detection Based On Ensemble Learning

Posted on:2023-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhaoFull Text:PDF
GTID:2558306908450394Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In the context of the rapid development of Internet technology,while the network brings great convenience to people’s daily life and work,the network security is also facing great challenges.In particular,the goal of Internet of Everything expected by the sixth generation mobile communication technology makes the protection of network security more and more urgent and more important.With the emergence of new technologies and new network structures,the network traffic shows an exponential growth trend,the manifestation and category distribution of data are more complex and diverse.The traditional manual labeling by network security experts or model construction by simple machine learning methods is inefficient and can not meet the needs of real-time and high detection efficiency.With the in-depth study of network anomaly traffic detection,researchers found that the data used for detection is unbalanced,and the acquisition of data labels is also very difficult,which affects the effect of the detection model and the detection performance of different categories.Benefiting from the continuous development of ensemble learning,through the simple integration of multiple base classifiers,it has achieved better performance than the traditional machine learning methods.This paper deeply studies the problem of data imbalance and lack of labels,which have a great impact on the detection effect in network abnormal traffic detection.Aiming at the shortcomings of the existing methods to solve these two problems,this paper puts forward the solutions combining ensemble learning and data generation,and the solutions combining ensemble learning and semi-supervised learning to realize the anomaly detection of network traffic and provide guarantee for network security protection.The main research works are as follows:1.Aiming at the problem of data imbalance in network anomaly traffic detection,a solution based on ensemble learning is proposed,which is combined with the data generation method.Firstly,considering the problem of data imbalance from the data level,a data generation algorithm is proposed to generate a few types of data on the premise of ensuring that the distribution of original data does not change,so as to initially alleviate the problem of data imbalance.Secondly,through experiments on a variety of single classifiers,the top three classifiers are selected: Random forest,Decision Tree(DT)and K-Nearest Neighbor(KNN).Finally,the constructed ensemble classifier is used to further alleviate the problem of data imbalance,and the data set added to the generated data is detected to obtain the final detection result.The experimental results show that while the overall detection effect of this model is improved,the recognition efficiency of different categories,especially for a few categories,is greatly improved,which proves that our detection model can alleviate the problem of data imbalance and has better detection performance.2.In order to solve the problem of lack of labeled data in network anomaly traffic detection,an ensemble learning solution combined with semi-supervised idea is proposed.Firstly,aiming at the problem of lack of labeled data,the pseudo-label technology of semisupervised idea is used to give pseudo-label to unlabeled data.The problem of error superposition given by pseudo label is mainly considered.The calculation method of pseudolabel reliability is designed,and the two differences are weighted and fused to improve the reliability of pseudo-label.At the same time,two classifiers,KNN and Support Vector Machine(SVM),are integrated to provide more possibilities for the selection of pseudolabel,improve the reliability of the selected pseudo-label and reduce the impact of false label error assignment as much as possible.The experimental results show that under different labeled data,this method can effectively improve the detection performance,improve the utilization of unlabeled data,and improve the recognition efficiency of different categories.In addition,further combining the pseudo-label ensemble algorithm with the data generation algorithm,it can achieve higher recognition effect for a few categories of data under the condition of making full use of unlabeled data,so that the model can provide better effect for network security protection.
Keywords/Search Tags:Network Traffic, Anomaly Detection, Ensemble Learning, Semi-supervised Learning
PDF Full Text Request
Related items