| Unknown protocol discovery and traffic classification is a branch of network traffic classification,which aims at mining new protocols and applications from massive amounts of network data,and classifying blended data which contains various of protocols.Compared with known protocols,unknown protocols' specification are undocumented,and the quantity of their sample is little,which are adverse to their recognition,so it is a great challenge in theory and technique for traffic classification of unknown protocols.This paper researches on network traffic classification of unknown protocols,aiming at improving the recognition rate of unknown protocols,and focuses on extraction of character strings of unknown protocols' packages,and increasing the amount of network traffic samples of rare classes.An improved PrefixSpan algorithm is proposed,which is used to extracting character strings from packages in one class;and an up-sampling algorithm based on kernel function model is proposed,which is used to increase the amount of network flow samples of rare classes;and an offline network traffic classification system is designed and realized.The specific work of this paper is outlined as follows:1.According to the fact that unknown protocols have undocumented protocol specifications and so on,a new automated method based on improved PrefixSpan to extract character strings of unknown protocols is proposed.Divide the sample set into subspaces according to frequent items,and scan each sequence in each subspace byte by byte,then the frequent strings can be obtained,and character strings can be obtained after the integration of frequent strings according to the characters of package format,which can be used to classify new traffic.Experimental results show that the accuracy rate of classification for messages with unknown protocols according to character strings extracted using this method is high.2.The number of samples of unknown protocols' traffic is small at the beginning of usage,which is adverse for recognition of the traffic,in the network traffic classification based on statistical characteristics,unknown protocols' traffic is always regarded as network samples of rare classes.In order to increase the number of samples of small classes,an up-sampling method based on kernel function model is presented.It is assumed that small class samples obeys mixed distribution of kernel function,the distribution density function of the small class samples can be estimated according to the distribution of existion samples,then a certain amount of small class samples can be generated,so the number of small class sample is increased.And a network traffic classification method based on Cluster-Map is proposed according to the up-sampling method.The theoretical analysis and experimental results show that the algorithm can improve the recognition rate of small class samples.3.An off-line network traffic classification system is designed and realized,which is done by using C# language.The system can extract network flows from network data packages,realize upsampling of samples of rare classes,and can classify the network traffics in off-line situations. |