| With the development and popularization of campus network,it has accumulated a huge amount of authentication data.It is of great research significance to mine valuable information from these authentication data,and campus network user behavior research is one of the hot research topics.In traditional research,empirical values is regularly used as the number of clusters for clustering authentication data without removing outliers by scholars,so the determination of the number of clusters is subjective and the clustering results are affected by outliers.The main work of the paper is to analyze the authentication data of campus network to mine users’ online time behavior characteristics firstly.Then it proposes and implements a clustering algorithm,FRCK(Fusion of Rough Clustering and K-means)algorithm,which can get the optimal number of clusters while removing outliers.In addition,the paper also mines users’ spatial behavior characteristics.The main contents of the research on the characteristics of users’ online time behavior are: extracting and cleaning undergraduate online authentication data for calculating the corresponding online time vector of each student;obtaining K-Canopy algorithm by improving Canopy algorithm to remove outliers;getting optimal clustering number by performance indexes and voting mechanism;clustering weekday and weekend vector sets without outliers respectively by K-means clustering algorithm;laying out online time characteristic about all kinds of students by analyzing the clustering results,and showing the changes of students’ time behavior characteristics by comparing the four-year characteristics of students’.All those results are valuable for student management etc.As for FRCK algorithm,it removes outliers dynamically during the iteration to determine the optimal clustering number,and its main research contents are: listingdefinitions and terms of FRCK algorithm;obtaining an initial number of clusters,K value;iterating through two steps of updating centroids and reducing K to remove outliers until all the centroids being stable;using student online authentication data set to verify the FRCK algorithm.The experimental results show that FRCK algorithm can remove outliers and automatically determine the number of clusters effectively.For the study of users’ spatial behavior characteristics,the main research contents are: using distributed statistical algorithms based on Map Reduce framework to get the number of wireless network connections of buildings in the campus;collecting the latitude and longitude coordinates of the building,and using the R-tree index and density clustering algorithm to divide the campus into 10 regions;obtaining the number of regional connections by combing with statistical results and clustering results;analyzing the results to explore the law of crowd movement between regions.The experimental results can be used as a reference for school bus routing,shared bicycle deployment and campus function area planning etc. |