Font Size: a A A

Research On Network Protocol Traffic Identification Based On Internet DataFlow

Posted on:2024-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:D W WeiFull Text:PDF
GTID:1528306914974259Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the number and type of network applications have significantly increased.On the one hand,it has greatly enriched people’s daily life.On the other hand,it has also increased the difficulty of Internet governance,which urgently requires more intelligent and efficient governance techniques and tools.Network protocol traffic identification can timely monitor abnormal behavior by identifying the type,content,and source of protocol traffic,which is of great significance for maintaining network security.At present,research on network protocol traffic identification has reached certain achievements.However,with the refinement of Internet governance requirements,network protocol traffic identification faces challenges such as multiple application scenarios,encrypted transmission,and complicated requirements.To address these issues,this thesis focuses on the research on the known network protocol traffic(network protocol traffic that can be collected and marked publicly)identification algorithm in multiple application scenarios,encrypted unknown network protocol traffic(network protocol traffic generated by private and zero-day applications)identification algorithm,and collaborative combination of network protocol traffic identification models under mixed identification requirements.Ultimately,a network protocol traffic collaborative identification system is designed and developed.The main contents are as follows:(1)Research on the known network protocol traffic identification algorithm in multiple application scenarios:Aiming at the problem that the existing network protocol traffic identification algorithms focus on specific application scenarios and are difficult to apply to multiple application scenarios,a deep learning identification algorithm based on the fusion of the Convolutional Neural Network and the Long Short-Term Memory Neural Network(CNN+LSTM)is proposed,and is compared with 5 typical machine learning identification algorithms based on flow statistical characteristics(SVM,KNN,XGBoost,LightGBM,Catboost).In order to solve the problem that machine learning identification algorithms based on flow statistical characteristics need to re-extract and select flow characteristics in different scenarios,the CNN+LSTM algorithm uses CNN to automatically extract the traffic characteristics of known network protocols,and introduces LSTM to dig the interdependent information between network traffic sequences,improving the applicability in different application scenarios.In order to verify the applicability of the proposed algorithm in different application scenarios,3 types of application scenarios are designed according to the different granularities of the traffic identification requirements:identifying VPN traffic(coarse),identifying application programs(medium),and identifying protocol terminals(fine).The results show that compared with the other 5 typical machine learning identification algorithms based on flow statistical features,the proposed CNN+LSTM has an identification accuracy of no less than 84%in 3 application scenarios.It has obvious time advantages in feature extraction and model tuning,and can provide selection reference for known network protocol traffic identification in multiple application scenarios.(2)Research on the encrypted unknown network protocol traffic identification algorithm:Aiming at the problem that unknown network protocols are mostly encrypted with flow characteristics insufficient to represent,and it is difficult for unlabeled data to participate in the training,an unknown network protocol traffic identification algorithm based on the self-supervised learningJigClu is proposed.The algorithm first extracts unknown network protocol traffic features based on the surge periods(the surge time of network traffic density or bandwidth utilization),which enhances the amount of encrypted network protocol information contained in the flow features,and the expression ability of encrypted unknown network protocol flow features;Secondly,based on the self-supervised learning JigClu,the unknown network protocol traffic identification model is trained to solve the problem that unknown network protocol traffic without labels is difficult to participate in the training.The experimental results show that the proposed algorithm has an average identification precision of 75%on the public dataset ISCXVPN 2016,achieving the encrypted unknown network protocol traffic identification.(3)Research on the collaborative combination of network protocol traffic identification models under mixed identification requirements:Aiming at the challenges of model training difficulty and low identification efficiency of a single network protocol traffic identification model under mixed requirements with different granularities and hierarchical relationships,a collaborative combination method of network protocol traffic identification models based on a tree hierarchical structure is proposed.According to the proposed algorithms in research(1)and(2),the suitable network protocol traffic identification models are trained under different requirements.Based on the granularity of different requirements,a hierarchical structure of multiple network protocol traffic identification models is established.With the collaborative combination between different hierarchical structures,a tree hierarchical structure is generated.Through the filtering effect of the upper models on useless samples,the collaborative combination of various models under mixed requirements is achieved,which improves the overall identification efficiency.Finally,a network protocol traffic collaborative identification system is developed.The experimental results show that the system can reduce computational time by about 13%-50%in most environments,with ensuring the precision of traffic identification,which improves the overall efficiency of network protocol traffic identification under mixed requirements.
Keywords/Search Tags:Network protocol traffic identification, unknown network protocol, machine learning, self-supervised learning, tree hierarchical structure
PDF Full Text Request
Related items