Font Size: a A A

Research On Encrypted Traffic Identification Based On Text-gan

Posted on:2021-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2428330614965730Subject:Logistics Engineering
Abstract/Summary:PDF Full Text Request
With the continuous popularity of mobile phones,more and more mobile phone applications are flooding into the market.When users use the application,they will generate a large number of traffic recording the user's operation information.By analyzing the application traffic generated by users,we can get the user's operating habits,application categories and other information,which has great mining value.However,with the development of traffic encryption technology,more and more companies begin to apply encryption technology to the traffic generated by users,which brings difficulties for identification.Traditional application traffic identification methods,such as network traffic identification technology based on port number and application layer protocol label,can not be used in the field of encrypted traffic.More and more researchers begin to turn to the application of machine learning and deep learning technology in the field of encrypted traffic.This kind of method can well solve the problem that the traditional traffic identification method can not be applied to the encrypted traffic,but compared with the simple characteristics of the traditional method,using machine learning or deep learning technology needs a lot of data to support,so that the model can learn the characteristics of various traffic.Not only that,the traffic data used for training needs to reach the balance of all kinds of data sets as far as possible,in order to obtain better training effect.However,capturing and labeling traffic data is a very time-consuming work.At the same time,due to the different number of users in different applications,the resulting application traffic is also more or less,which leads to the problem of data imbalance.Based on this,this paper proposes a traffic identification system based on the generated countermeasure network.Firstly,aiming at the problem of unbalanced traffic data sets,text-gan based on self attention technology is used to expand and balance the traffic data,and then the balanced traffic data is identified by combining long-term memory network LSTM.The model is trained and validated with the open data set of "iscx VPN non VPN traf fi C dataset".The accuracy rate can reach 0.9948,recall rate can reach 0.9937,F1 score can reach 0.9937.Compared with the traditional MLP method,the model has a significant improvement in three evaluation indexes.Furthermore,this method is applied to the encrypted traffic data generated by the manually grabbed e-commerce app to identify the user behavior in the traffic.The main innovations of this paper are as follows:1.A text-gan traffic generation method based on self attention technology is designed.The self attention mechanism supporting parallel computing is used to replace the LSTM layer which cannot be parallel computing in the original generation network,so as to improve the speed and quality of traffic data generation.2.Combined with long-term memory network(LSTM),the balanced traffic data is identified.Compared with the previous neural networks such as MLP,LSTM considers the word order information in traffic,which is more suitable for traffic data classification.3.The traffic identification is extended from application identification to user behavior identification to reflect the universality of the whole system.This paper uses the actual user's app traffic data collected by the current network to identify and analyze,and evaluates the identification results on the accuracy rate,recall rate and F1 score.
Keywords/Search Tags:Self-Attention, TEXT-GAN, LSTM, encrypted traffic, user behavior, traffic classification, data expansion, data set balancing
PDF Full Text Request
Related items