| With the introduction of the concept of smart grid and smart electricity consumption,the traditional power industry is constantly changing.A large number of smart terminal collection devices are widely used in the power system,making the power grid system generate massive data every moment.How to effectively use the big data resources of the power grid will be an urgent problem to be solved in the power industry.In the demand-side response based on time-of-use electricity price,research and analysis of large-scale users’ electricity consumption behavior is of great significance to fully mobilize power users to actively participate in demand-side response and alleviate the operating pressure of the power grid during peak hours.This paper first introduces the relevant background of big data and time-of-use electricity prices in smart grids,discusses the current research status of domestic and foreign users’ electricity consumption behaviors,and clarifies the related technologies of load characteristic indicators,data mining and big data platforms,and provides a basis for future user research.The analysis of electricity consumption behavior and the mining of demand-side response potential users lay a theoretical foundation;secondly,in view of the blindness and uncertainty of using the classic K-means to select the initial value,this paper proposes an improved K-means clustering algorithm based on the Canopy+ algorithm.The results output by Canopy+ pre-clustering are used as the initial value of the K-means algorithm to solve the blindness of the number of clusters selected by K-means and the uncertainty of the initial cluster center point,combined with the UCI data set The improved clustering algorithm and the original K-means algorithm are clustered and compared with each other.The experiment shows that the improved clustering algorithm has a good performance in the selection of K value and clustering accuracy;finally,the preprocessed The daily load data of large users is clustered by stand-alone K-means and memory parallelization.The experimental results show that,on the basis of reducing the number of iterations,the sum of squared errors of the improved clustering algorithm is reduced by 3659.906,the silhouette coefficient is increased by 0.03,and the DB index is decreased by 0.06..Using the improved algorithm proposed in this paper,based on the Spark big data analysis environment,a parallel clustering analysis is carried out on the daily load data of 7609 industrial and commercial users in a certain area of Shenyang in March 2021,and finally the power users are divided into five categories,among which The peak potential and high-quality demand side response users are the 2nd and 5th types of users.Although the proportion of electricity consumption in the peak hours,valley hours and normal hours of the second type of users is not much different,the overall electricity consumption is large and in a high load state,so it has a certain peak-shaving potential;the fifth type of users in peak hours The proportion of electricity is relatively high,the proportion of electricity consumption during normal hours and valley hours is relatively low,the difference between peak and valley electricity consumption,and the difference between peak and average electricity consumption are relatively large,so it has good potential for peak regulation and valley filling.At the same time,based on the typical daily load curve of each type of user,the electricity consumption characteristics of each type of user are analyzed,and the electricity price strategy is proposed. |