Font Size: a A A

Research And Implement Of A Method For Tracing User Behavior Based On Improved K-means Algorithm

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:F Y MaFull Text:PDF
GTID:2518306308475304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread use of mechanisms such as dynamic host configuration protocol(DHCP),many Internet service providers assign their customers dynamic IP addresses that change periodically,creating challenges for long-term traceability of user traffic.DNS access records reflect "people" to the site access behavior,if it is amplified to a larger time range,which contains the network users access intention and access habits.So DNS access records are in a good position to trace user behavior and are more readily available in the real world for us to store for long-term analysis.If a large number of DNS access records of a user are collected for behavior mining and patterns of these behaviors are found,application requirements such as user identification and user analysis can be met.Therefore,we need to look for the possibility of user behavior traceability using DNS traffic.This paper proposes a method of user behavior tracing based on the improved semi-supervised k-means machine learning algorithm,and designs a user behavior tracing system.The system can be based on DNS traffic data collection,using the method of improved a semi-supervised k-means the user behavior patterns mining is analyzed on the user's IP address changing scenarios to finish the user identification,which allows users to DNS traffic acquisition for a long time,so as to achieve maximum under limited data scenarios traces of user behavior.The main work of this project includes two aspects:1)An improved k-means algorithm is proposed for user behavior traceability,including the feature vector construction method based on domain name access behavior,the improved k-means algorithm based on equal group division,and the optimal result selection algorithm based on multiple initializations.2)Based on the above methods,a system based on the improved k-means algorithm was designed and implemented to track the user behavior.The system mainly consists of server,data storage and Web display.The server side includes data acquisition module,data processing module and improved k-means model algorithm module.The DNS traffic of users can be collected periodically.After filtering,algorithm training and other steps,the DNS traffic of different users can be divided.Experimental results show that the improved semi-supervised k-means algorithm can show high accuracy in a small number of users.
Keywords/Search Tags:improved k-means algorithm, DNS data mining, cluster analysis, user behavior traceability
PDF Full Text Request
Related items