| De-anonymization refers to the methods which trace the originator of the contents, network or application behavior with the help of series of techniques. Normally we can identify the source with the IP information of a stream. But when senders try to hide their information with multiple measures, it’s a challenge job to get the source of flow.Anonymity network is an effective way to hide users’identity. Anonymity network is widely used in privacy protection, but at the same time, it also has been abused by lawbreaker in cybercrime activities. The existing De-anonymization methods have a lot of limitation in the complex network environment nowadays. The goal of this paper is to get the source of information and de-anonymize the relationship between client and destination when we can get the entry traffic which is defined as the traffic from client to the entry node and the exit traffic which is the traffic between the exit node and the destination.As the world’s most widely used anonymous communications network, Tor provides users with low-latency anonymous communication service. We chose Tor for analysis and a series of studies based on the traffic of Tor are given in this paper.First, in order to get the feature of the anonymity network traffic, analysis and measurement about Tor are carried out. According to the packets, we analysis the distribution for the size of Tor exit traffic and the country where the destination of Tor users locate. All these measurement provide us a basis for de-anonymization. Besides, the flow analysis also gives us information about Tor’s feature extraction.Second, considering Tor traffic is encrypted by SSL, it would improve the accurate of de-anonymizaiton if we can identify the Tor traffic from other SSL encrypted application traffic. Based on the studies of Tor protocol and traffic, we finally chose the packet length as feature for the identification of Tor traffic. Our method is built on the SVM algorithm. In the offline environment, we apply our method to achieve the classification of Tor traffic; experimental results show that the classification precision and recall rate can reach90%.Third, we design and achieve the de-anonymize system based on traffic analysis (TADAS). With k-means algorithm, we extract features from the entry traffic and the exit traffic in order to associate all these traffic and obtain the source of the given exit traffic. We evaluate our system in the real Tor network and get a good result. When the stream size bigger than200KB, the accuracy of TADAS can exceed90%. |