| In recent years, rapid development of P2P applications in file sharing, instant messaging and streaming transmission have become one of the most prevalent network traffic which can’t be ignored. Refer to related investigations, P2P traffic has accounted for more than60percent of all network traffic since2004. For now, it’s difficult to deal with this situation just by increasing network capacity. So P2P traffic identification and filtering technology have become the most effective solutions which researchers have to make great efforts to investigate.On the basic of working principle and implementation mechanism of P2P traffic identification, the thesis studies and analyzes existing issues in P2P traffic identification field and technologies which can identify P2P traffic effectively. The main research and innovations are as followed:Firstly, starting from the working principal in P2P traffic identification, the thesis summarizes the existing P2P traffic identification technologies, including port-based identification, deep packet based detection technology, machine learning-based distinction technology and recognition technology based on network behavior, analyzes their advantages and disadvantages and proposes that machine learning based technology can be used in the P2P traffic identification.Then, this thesis mainly works on the machine learning method to solve P2P traffic classification. Using feature selection algorithms, the thesis puts forward a process to evaluate the performance of four machine learning algorithms. As a result, it reached a conclusion via the experiment that the decision tree based machine learning algorithm has the most valuable performance in huge dataset.Finally, the thesis deeply analyzes and researches characteristics of P2P traffic. It chooses four feature attributes which could significantly distinguish P2P traffic. Datas are collected in large, training and testing datasets are computed by traffic processing module. Conducting performance analysis in four attributes combining impoved VFDT classification method, the thesis proofs the validity of improved VFDT algorithm and obtains a decision tree model of P2P traffic identification.Although P2P technology has been mature to development, but there is still disadvantages in identification accuracy. P2P traffic identification technology not only provides effective solutions for Internet Service Provider, but also solves the problems of other business users whose bandwidth are consumed by P2P users. It is an important task to enable Internet users to surf the web more pleasant. |