| Since the emergence of peer-to-peer (P2P) networking in the late 90s, P2P applications have experienced a pretty rapid development and occupied many network application fields. P2P applications have influenced the normal work of other network applications, while providing the amazing convenience to users. Some P2P applications adopt strategies such as using dynamic ports and encrypting payload to escape from traditional traffic identifications methods, causing many difficulties for traffic identification and management.The identification of P2P applications is based on the far study of P2P technology. According to the development of P2P technology, several different P2P search technologies are analyzed from the perspective of topology. Meantime, a classic P2P application named eDonkey is analyzed from the perspective of protocol.It is hot that P2P traffic identification on research field, so many identification methods appear, such as the way based on the ports, the way based on the traffic characters, the way based on the action of transport layer, the way based on the DPI, and so on. However, there are some drawbacks on these ways. Compared to traditional traffic identification technologies, Support Vector Machine technology yielded encouraging effects on P2P traffic identification technologies. As built on structural risk minimization and the theory of Statistical Learning, SVM is not easy to be run into local optimum. And it holds special advantages on classification problem which has small samples. Based on the research of P2P and SVM technology, P2P traffic identification based on SVM is present.Based on the research of SVM technology, the framework of P2P traffic identification based on SVM is proposed. The function, mechanism and realization components of this framework are discussed in the thesis. Traffic characters are selected from three ways which include data package layer, data flow layer and network connection. By means of HVDM distance metric of heterogeneous datasets, the feature data of traffic is preprocessed. Based on guaranteed estimators, the size of test set is estimated. In this way, not only the bad train results for lack of example are avoided, but also the training time is reduced. During the training, two optimized ways were used. First, by means of fuzzy mathematics, considering the effect of different traffic characters to the classification, a weight method is brought forward. It improves the accuracy of P2P traffic identification. Second,"Block Arithmetic"is used, which uses cycle selection to select training data. In this way, traffic characters can be added by little traffic, and the other is much training data which has the same traffic characters can be avoid repeat. The rate of failing to report and the rate of making a report are criterions to rule the way in the experiment. Through the experiment, the P2P traffic identification technology based on the SVM does a good performance. |