| In the new era of information construction,the campus network of colleges and universities is the network with the highest utilization rate of students and teachers.With the rapid increase of the utilization rate of the campus network,a huge amount of campus network user log data has been produced.Nowadays,in the era of big data,it is of great significance to use big data technology to study and analyze the massive campus network data and efficiently and accurately dig out the Internet behavior rules of campus network users,which is of great significance in optimizing network management and student management.Standard K-means algorithm is one of the most commonly used method of network user behavior analysis,there are kvalue uncertainty,is sensitive to the initial center,and is not suitable for the disadvantage of large data clustering,this paper aiming at these deficiencies,proposed improvements on Kmeans algorithm,using Spark distributed computing framework for K-means++ improved algorithm combined with the optimization profile coefficient of SOSK-means++(Spark based Optimized Silhouette K-means++),and applied to a university campus network user behavior analysis.The analysis of experimental results and practical application show that the improved algorithm can solve the influence of kvalue and initial center of standard K-means algorithm on the clustering accuracy and efficiency to a certain extent,and improve the accuracy of clustering.The Spark platform also effectively solves the problem that the standard K-means algorithm runs for a long time due to the large amount of data,and improves the parallel computing performance of the algorithm.In this paper,on the basis of a university campus net,emphasizes the billing system as the research object,the network behavior of log distributed platform based on the Spark of the analysis of campus network user behavior analysis and visual display,to realize the optimization of the standard K-means algorithm,and applied to the campus network user behavior analysis.The main research work of this paper is as follows:(1)Through the research on the limitations of the standard K-means clustering algorithm,the disadvantages of the standard K-means algorithm are as follows: On kvalue uncertainty,is sensitive to the initial center and the huge amounts of data clustering operation time is too long,put forward the improved algorithm SOSK-means++(Spark based Optimized Silhouette K-means++),using the K-means++ algorithm combined with contour coefficient optimization,mainly from two aspects of standard K-means algorithm is Optimized to improve,to use fast distance calculation and standardized Euclidean distance distance calculation formula,and Spark parallel platform to achieve the improved algorithm.The accuracy and acceleration ratio of the algorithm are verified by experimental data.(2)Based on the above research contents and results,an extensible campus network user behavior analysis system based on Spark is built to collect and preprocess the campus network user data.Statistical analysis and improved algorithm SOSK-Means++ are used to perform cluster analysis on the campus network user data.The campus network user behavior is analyzed according to the log data of the campus network,including login time,logout time,usage duration and usage flow,and the analysis results are written into the MySQL database.Finally,the Spring MVC framework was used to develop web applications,and the data in the MySQL database were visualized and analyzed in detail.Through comparative experiments,the clustering results of SOSK-means++ algorithm in campus network user behavior analysis are proved to be accurate,stable and effective. |