| NAT (Network Address Translation) technique provides a convenient way tosolve the problem of the shortage of IPv4addresses and enable several clients to sharea single IP address, which brings lots of convennience and benefit to clients. However,NAT also make it easier for unauthorized clients to access networks conveniently,which brings serious security risks to the ISP (Internet Service Provider). Consequently,we have to find a NAT detection method which is able to identify several clientsbehind NAT device and a single client efficiently. Most existing methods rely on aspecific field in IP packets, and tend to become invalid if the field is modified or justcan’t satisfy detection demand. Moreover, those methods are limited by operationsystem and surfing habits of clients.Therefore, we propose a NAT detection method based on network traffic features,which don’t rely on any specific field in IP packets, nor operation system or surfinghabits of clients. In view of the NAT detection method can be regarded as typicalbinary classifying problem in data mining techniques, we introduce data miningtechniques and apply them in the NAT detection method. The general idea of ourmethod is to gain network traffic features of all hosts in networks by IP addressesfirstly, then to treat these IP addresses and feature parameters as instances andattributes in data mining, finally to achieve NAT detection result through dividingnetwork flows into NAT flows and non-NAT flows with data mining techniques.The core of our method is network traffic features, which is the basis of dividingNAT devices and normal clients by data mining techniques. In this paper, we analyzethe traffic features of NAT devices and normal clients comparatively, summarize11kinds of NAT traffic features which reflect the difference of NAT devices and normalclients, and conclude a set of NAT traffic feature parameters which include28kinds offeature parameters. We test the classification performance of these feature parameters,the results show that the set of NAT traffic feature parameters is able to divide NATdevices and normal clients accurately.The main contribution of us is analyzing the traffic features of NAT devices and normal clients comparatively in order to gain a set of traffic feature parameters whichreflect the deviation of NAT devices and normal clients, then apply the featureextraction and feature selection on network traffic data, and finish NAT detection bymeans of data mining techniques. In our simulation we use two supervised learningclassification algorithms, C4.5Decision Tree and Naive Bayesian, and an unsupervisedlearning clustering algorithm, K-means. We validate the validity and veracity of ourmethod through analyzing the simulation results of differrent kinds of algorithmscomparatively. We also compare the results of adopting feature selection or not, so thatwe gain the most effective network traffic feature parameters in current simulationenvirement. Finally, we test the influence to the performance of our method caused byP2P flows, the simulation results show that the influence is acceptable. |