| As the everyday life of people relies on the Internet more and more deeply, the problems of network security and the protection of personal privacy attract a lot of attention. These problems have become the hot issues of the network study. The devices of network such as firewalls, switches and routers record logs. It is an important way to find the unusual activities in the network by analyzing the logs.Data mining provided a lot of methods to find the related information in large amounts of data. In data mining, ant-based clustering algorithm is a good clustering method. It can realize self-clustering without any prior knowledge. Besides, it has the advantage of flexibility, robustness and visibility. But ant-based clustering algorithm is time-costing. It has the potential to be improved in clustering efficiency and accuracy. The researches about how to apply ant-based clustering algorithm to classify logs are very few. The text of logs has its own characteristics. It will benefit the clustering result, if we can transform logs into vectors according to these characteristics.This paper studies ant-based clustering algorithm deeply to solve the problems talked above. According to the characteristics of the log text, a new method of transforming logs to space vectors is proposed in this paper. Both log terms’document frequency and term position are taken into account in this transformation, which has been proved effective in the experiments. Beesiaes, two improvements are maae to the original ant-pasea clustering algorithm. Firstly, a colony memory which records the most recently dropped objects and their locations is adopted, which helps make the clustering more accurate and effective. Secondly, by adding an ordered list sorted by objects’fitness to its local neighborhood it is helpful for ants to make better picking-up decisions.In the experiments, compared with different algorithms, our improved algorithm exhibits better performance in clustering quality and accuracy on two different datasets with a tolerable time cost. |