| In recent years,with the rapid development of Internet technology,a number of Internet security issues have emerged at the same time.Malware traffic classification(MTC)is a key technology in the field of Internet security.Traditional MTC methods rely on manual design features and have low accuracy.Deep learning methods have high classification accuracy,but their performance often depends on a large number of labeled samples,and it takes a lot of human costs to label a large number of data in practical application.Based on this,it is a hot field to study MTC technology in the case of small samples.In view of the situation that there are fewer labeled samples and more unlabeled samples in MTC,based on semi-supervised learning(SSL),transfer learning(TL),domain adaptation(DA)and other technologies,this paper uses theoretical structure analysis and experimental simulation methods to deeply study the classification technology and application of malware traffic in the case of small labeled samples.The paper includes three parts:(1)To solve the problem of poor classification performance of supervised deep learning methods under small labeled sample conditions,this paper uses SSL and TL technologies to improve traditional machine learning and deep learning methods,and designs a semi-supervised learning model based on convolutional autoencoder(CAE)and random forest(RF),and a deep learning model based on TL and convolutional neural network(CNN),respectively,then evaluates their performance.The experimental results show that the deep learning method is superior to the traditional machine learning method in the network traffic classification task,and the use of SSL and TL technologies can improve the classification performance of the model under small labeled samples.(2)In order to improve the utilization efficiency of a large number of unlabeled samples,this paper proposed a method of combining CNN with Laddernet,a classical semi-supervised network.On this basis,in order to further improve the performance of network traffic classification under the condition of small labeled samples,this paper uses TL technology to transfer the knowledge of software traffic data from different domains to increase the feature distribution space of the training model,making the trained model have higher classification accuracy and generalization.Experimental results show that compared with traditional methods,combining CNN with Laddernet can greatly improve the classification performance under small samples,and the addition of TL further improves the classification accuracy.(3)Although TL technology can integrate knowledge from different data domains,when the distribution of source domain data and target domain data does not match,the effect of model training will be negatively affected.To solve this problem,this paper uses DA technology to map the characteristics of source domain data and target domain data to the same Hilbert space to enhance their correlation,thus improving the impact of TL when the data distribution difference is too large.Experimental results show that the addition of DA method effectively enhances the connection between the data distribution characteristics of source domain and target domain,and it has higher classification performance under the small labeled sample conditions than using SSL and TL methods alone. |