| TLS(Transport Layer Security)encrypted network communication has emerged as the primary mode of communication for remote intrusion attacks to evade security monitoring by firewalls and intrusion detection systems.Therefore,the identification of malicious TLS traffic has become a critical area of research in the field of network security protection.Existing research has focused on training suitable identification models through machine learning algorithms based on a large number of malicious TLS traffic samples.However,the practical implementation of these methods for network supervision still faces several challenges.Firstly,in order to facilitate network administrators in developing and updating security defense strategies in a timely manner,detection methods should identify specific malicious software in the network,rather than providing coarse judgments such as "malicious traffic".Secondly,if individual malicious software is the object of study,it is often challenging to capture a sufficient amount of encrypted TLS traffic samples generated by it,which may not meet the requirements of large sample training data for existing methods.To overcome these aforementioned challenges,this paper aims to investigate the detection of malicious TLS traffic at a granular level under scenarios with limited sample sizes.The specific research objectives and scope of this study are outlined below:First of all,this paper investigates the construction of a dataset for malicious TLS traffic and its feature analysis to gain a precise understanding of its characteristics.To achieve this,publicly available datasets from www.malware-traffic-analysis.net and Stratosphere are labeled using Virustotal to remove normal TLS traffic and reduce data interference.The paper then analyzes the features of malicious TLS traffic,such as packet length,direction,and time interval distribution.Feature distribution interval analysis is carried out on TLS samples produced by different types of malicious software to distinguish between various types of malicious TLS traffic.Additionally,to confirm the effectiveness of the feature selection,TLS traffic features are converted into RGB images for intuitive observation.The selection and analysis of features for malicious TLS traffic lays the foundation for subsequent research in this area.Next,the paper proposes a method for fine-grained classification of malicious TLS traffic in the context of single-sample scenarios involving encrypted Command and Control(C&C)connections of malware.Once a host is invaded by malware,the first step is to establish a TLS network connection with its C&C server to obtain further attack instructions,such as lateral movement within a network.Therefore,some types of malware generate only a single TLS flow.To achieve fine-grained classification of single-sample malicious TLS traffic,the paper designs and trains a TLS flow feature extractor using a large number of HTTPS flows generated by legitimate websites as training samples.Based on the TLS flow feature extractor,a feedforward neural network is trained to automatically generate the weight parameters of Support Vector Machines(SVMs).Multiple SVMs are initialized using malicious TLS traffic samples,and a voting strategy is used to classify unknown samples.Experimental results demonstrate that the proposed method achieves an accuracy rate of 76%,which is significantly higher than the baseline method’s rates of 46.5% and 50%.Finally,this paper investigates the scenario where malicious software establishes multiple malicious TLS connections after the C&C communication,for the purpose of executing further attacks such as code downloading and data theft.Specifically,a small sample malicious TLS traffic fine-grained classification method is proposed in this paper.A pre-trained Image Net model is finetuned with small sample data to serve as the final classifier.Based on the TLS traffic feature extractor introduced in Chapter 4,XGBoost model is first employed for coarse-grained classification of unknown traffic.Then,the feature of the traffic samples identified as malicious TLS traffic is filled and input to the final classifier for fine-grained classification.The performance of the proposed method is analyzed theoretically and validated through experiments.The experimental results show that the classification accuracy of the model increases with the increase of the sample size,which is consistent with the conclusion of the theoretical analysis. |