Font Size: a A A

Research On Open Set Network Traffic Classification Based On Machine Learning

Posted on:2024-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WuFull Text:PDF
GTID:2558307136992879Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Network traffic classification plays an important role in network management.It is one of the basic technologies in the field of network service quality assurance and network security.With the rapid development of Internet and multimedia technology,there are more and more types of network traffic,and new network applications emerge in an endless stream.However,traditional classification techniques in closed environment can not recognize unknown class samples,which may lead to misclassification and affect the accuracy of classification.Open set recognition of network traffic can break the restriction of closed environment,classify known classes,and detect unknown traffic.The existing open set network traffic recognition methods mainly focus on new class detection,but the overall performance and classification speed need to be improved.Therefore,this thesis focuses on the research of open set recognition methods for network traffic.The main work of this thesis includes the following three parts:(1)In order to find out the confidence characteristics that can identify different samples,this thesis analyzes the confidence distribution of the known and new classes from three perspectives:confidence,absolute difference of intermediate confidence and maximum confidence difference.By comparing the frequency distributions of the three,it is found that the maximum confidence difference can be used to distinguish between known and new classes.At the same time,the classification threshold is selected based on the difference in frequency distribution.Experiments show that the maximum confidence difference is effective for new class detection in different combinations of known and new classes.And the selected classification threshold is universal to different mixed datasets in the experiments.(2)In order to overcome the limitations of detecting unknown traffic based on threshold and better detect new class samples that are difficult to identify,a new design strategy is proposed to generate pseudo negative samples using unlabeled data.Firstly,five algorithms are designed from different angles with different methods.Then the experimental comparison is carried out to select the algorithm that filters the most known class samples and retains the most unlabeled stream samples as the optimal algorithm and is applied.Experiments show that compared with adversarial sample generation,the designed optimal algorithm not only reduces the consumption of computing resources,but also guarantees the training effect of the model.(3)Two unknown network traffic detection frameworks based on confidence information and cascade structure are designed.The first method is to first detect new class samples with high confidence,then use the maximum confidence difference threshold to classify the remaining new and known classes,and finally use the maximum confidence to classify the known classes.Another approach is to first use the maximum confidence difference threshold to divide the test sample into two parts,and detect the possible mixed new class samples from the samples that meet the threshold condition;then,detect the possible mixed known class samples from the samples that do not meet the threshold condition,and finally use the maximum confidence to classify the known classes.In addition,a detection framework that is easy to update is proposed,which consists of a binary classifier trained by a single known class and a series of filtered pseudo negative samples.Experiments show that the first structure is better,and the classification performance and time performance are superior to the representative literature methods.The easily updated detection framework also performs better than the representative literature methods in different updating stages.The advantages and disadvantages of the first structure and the easily updated detection framework are compared and discussed,and a structure combining the two is designed.
Keywords/Search Tags:network traffic classification, open set recognition, confidence information, unlabeled data, cascade classification
PDF Full Text Request
Related items