| In recent years,cybersecurity has become a popular area of research,with security incidents like data breaches,attacks on critical infrastructure,cryptojacker attacks,etc.popping up all over the place.With the popularity of the Internet of Things,billions of devices are connected to the Internet,providing more opportunities for attackers to exploit.To curb network attacks and malicious network behavior,experts and scholars have started to study traffic identification,which is the most important step to ensure network security,and effective identification methods can improve the accuracy of network behavior analysis and anomaly detection.With the rapid development in the field of neural networks,a large number of scholars have started to study how neural networks can be applied to network security to detect attacks or build security solutions,but most of the solutions still have room for improvement.Real-world traffic identification has multiple challenges that are not apparent in a controlled laboratory environment.In the real-world prediction phase,recognition systems must deal with a large number of unknown classes of samples,and most existing traffic recognition schemes do not consider unknown samples,resulting in many unknown samples being incorrectly identified as known samples in real-world applications,which can be potentially fatal for a secure system.Real data is inherently dynamic,and new unknown inputs can be ignored or rejected by the classifier by adding additional steps,or a new classifier can be designed that continuously detects new inputs and processes the unknown inputs accordingly.Therefore,in this paper,we investigate the open-set classification-based network traffic identification technique,and the main work accomplished is as follows:1.For feature processing,this paper proposes an efficient,multi-log correlation-based feature extraction method.Firstly,we introduce the dataset we use,XDU-MET2020,and preprocess the raw data;then,we analyze the statistical patterns of network traffic,TLS(Transport Layer Security)explicit handshake packets and contextual data in total three dimensions are analyzed to represent the raw traffic with the features of these three dimensions.Finally,we design the specific process of multi-log correlation,which outputs the information in the flow to the corresponding log files according to the feature information of the three dimensions to form the original data structure,and then aggregates all the logs according to the HASH index,and summarizes all the traffic features.The feature extraction method designed in this paper improves the data richness,traceability and feature expression capability.2.For the open set recognition of network traffic,this paper proposes an open set classification-based network traffic recognition model by analyzing the defects of existing models and inspired by the Open Max model.we first detect the known classes and simultaneously extract the activation values of the softmax and openmax layers in the classification model to construct a two-dimensional feature vector as the next stage of input.The unknown class detection stage builds a single-class support vector machine model for each class based on the output of the first stage.If any input is identified as an outlier during testing,that input is discriminated as one of the unknown classes,and we introduce incremental learning methods so that these unknown classes can be reused.Then,the samples are prevented from being discriminated as an unknown class with high confidence by setting the mean restriction.Finally,we show that the OSR model provides a theoretically bounded open-space risk that formally provides an effective solution for the identification of unknown network traffic.3.In summary of the research results,this paper conducted comparison experiments based on XDU-MET2020 and ISCXVPN2016 public datasets,on the one hand,by comparing the performance with the existing open set classifiers Open Max and DOC;on the other hand,by extracting different layers and elements in the unknown class detection stage to construct different single-class support vector machine models,combined into different models for comparison experiments.Then,we achieved the model performance optimization by adjusting the hyper-parameter values.Finally,by analyzing the experimental results,we found that the performance of the open-set classification model in this paper outperforms the DOC. |