| Nowadays, Peer to Peer (P2P) has become the largest consumer of network bandwidth, As a representative of P2P, BitTorrent(BT)Protocol has occupied over Sixty percent of the whole P2P network traffic, and presenting a sustained growth momentum. In the case that network traffic presented big data characteristics, only making a judgment that whether BT flows exist seems indistinct. Making a thorougher identification and classification, finding Tracker servers’ attributes and traffic generation circumstance of each client is necessary, it has very important meaning not only for the management of BT resource, but also for the monitoring of users’ usage.This thesis introduces the current research onBT traffic identification and analyzes limitations of those existing methods. With a detailed analysisfor BT protocol on both messages and flow field, this thesis presents a thorough BT traffic identification solution which can extract Tracker servers’ attributes and calculate each Peer’s bi-directional traffic. To deal with the big data characteristic and surpass the previous limitusing sampling data or active measurement, this thesis designs an operation chain to implement the above solution using distributed system Hadoop and data processing framework MapReduce. Then this thesis describes the design of a management system for those analysis results using distributed storage database HBase. By optimizing HBase table structure and the query logic, the system can provide services with efficient storage and multi dimensional retrieval. Some identification results proved the feasibility and benefits of this method and this thesis also have a detailed discussion with those Tracker attributes and flow characteristics. |