Font Size: a A A

Research On SSL/TLS Encrypted Traffic Identification Method Based On Markov Model

Posted on:2023-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:C LuoFull Text:PDF
GTID:2558307061950979Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,users have paid more attention to their own data security.Website and application developers on the Internet tend to use encryption protocols to protect the transmitted network traffic to protect user privacy;at the same time,malicious traffic also tends to use encryption protocols to evade security detection.These make the proportion of encrypted traffic in the Internet skyrocket.Traffic encryption makes traditional port-based and deep packet inspectionbased traffic identification schemes helpless.As the implementation of HTTPS,SSL/TLS encrypted traffic occupies a high proportion in the network.The improvement of TLS1.3 makes the existing encrypted traffic identification scheme based on Markov chain perform poorly.On the one hand,the increase of TLS encryption reduces the types of messages that can be used to build Markov chains,which directly leads to a decrease in classification performance;On the other hand,if the Markov model is directly constructed with the entire session flow,the long session flow will make the recognition computation heavy and the time prolonged.Aiming at the two problems of the existing traffic identification scheme based on Markov chain,a Markov model based on clustering enhancement and a fusion feature model are proposed,and on this basis,an encrypted traffic identification system is implemented.The details are as follows.1)Aiming at the problem of reducing the number of Markov chain states caused by the update of TLS 1.3,an enhanced Markov model based on clustering is proposed for traffic identification.The cluster-enhanced Markov model clusters the TLS messages whose message type is ApplicationData by length,and defines the clustering results as message subtypes,and uses the corresponding subtypes to replace the original message types to increase the number of available states of Markov chain,which can solve the problem of insufficient number of available states in the Markov model brought by TLS 1.3,so that the model can be well compatible with TLS 1.3.The comparative experimental results show that when the number of clusters is set to 9,the macro-F-measure of the proposed cluster-enhanced Markov model reaches 0.925,which is much higher than 0.737 using the Markov chain modeling method directly.Aiming at the computational cost and delay of constructing a Markov chain using the entire session flow,the proposed model uses the first part of the session flow to replace the entire session flow for modeling.Experiments show that the proposed cluster-enhanced Markov model only uses the first part can identify the application corresponding to the flow with high accuracy.When the flow length limit is set to 30,the macro-F-measure reaches 0.894.At the same time,setting the flow length limit greatly reduces the feature extraction time and test time of a single flow,the feature extraction time is reduced from 13.5ms to 4.8ms,and the test time is reduced from 8.6ms to 4.4ms.2)Since the proposed Markov model based on clustering enhancement cannot well identify similar short streams in some applications,a fusion feature model that combines the quantitative features of key ApplicationData messages in a TLS flow is proposed.In the fusion feature model,the relative probability vector of the flow output by the Markov model belonging to each application and the vector of the number of key messages in the flow are fused as fusion features for traffic identification.The experimental results show that the proposed method can solve the problem that similar flows cannot be identified.When the flow limit is set to 30,the macro-F-measure of the fusion feature model based on random forest reaches 0.982,which is higher than the 0.894 of the cluster-enhanced Markov model.At the same time,the comparison among the fusion feature model and the other two methods using Markov chain modeling directly and the MaMPF method shows that the classification performance of the fusion feature model is the best,and the feature extraction time of a single flow is the shortest when the flow length limit is set.Based on the proposed fusion feature model,a TLS encrypted traffic identification system is implemented.The system can identify the applications to which the traffic in the pcap file belongs with high accuracy,and the system can be updated in time when there are new applications in the environment.
Keywords/Search Tags:SSL/TLS, Encrypted Traffic Identification, Markov Chain, Power Law Distribution, Random Forest
PDF Full Text Request
Related items