| In recent years,with the emergence of the new concept of "big data","big data"has become a hot topic worldwide,and its technology has been widely applied in various fields.Electric power system is one of the most important application fields of big data technology.In addition,with the rapid development of smart grids,smart meters collect a large amount of residential electricity consumption data,which hides residents’ electricity habits,Mining and analyzing these electricity consumption data using data mining algorithms can help power grids personalize the classification of residents,thereby providing them with better services.Moreover,using the clustering results of data mining on residents to predict residential electricity load for each category can not only avoid the situation where it is difficult to grasp the single residential load forecasting law,but also make the load forecasting curve more stable.Thereby improving the accuracy of load forecasting and providing data support for the formulation of future demand side response policies.This article first introduces in detail the research status of big data technology,residential electricity behavior analysis,and residential electricity load forecasting at home and abroad,then proposes the MS-Kmeans algorithm on the Hadoop platform and the L-S-Seq2Seq model based on deep learning,and conducts experiments on residential electricity behavior analysis and residential electricity load forecasting.In order to solve the problems of initial point selection,outlier impact,and clustering instability in traditional K-means algorithms for clustering data,this paper proposes an MS-Kmeans algorithm based on the MapReduce framework.This algorithm divides the dataset into multiple sub datasets,selects multiple non outliers on the sub dataset as initial candidate center points,and performs mean shift.When the initial candidate center points are shifted to areas with high density,Using the minimum maximum principle,K points are selected as initial center points from the initial candidate center points,and then K-means algorithm is executed on the complete dataset to find the final center point.In order to improve the running speed of the algorithm for clustering large data sets,the algorithm is implemented on the Hadoop platform using the MapReduce framework for parallel computing.Experiments show that the parallel MS-Kmeans algorithm is feasible in clustering massive residential electricity data,and the algorithm has good performance in parallelism and stability.In order to solve the problem of selecting the length of input sequences and the weak ability of feature extraction in traditional methods for power load forecasting,this paper proposes a dual channel L-S-Seq2Seq load forecasting model that accepts both long and short series inputs.The model mainly consists of an L-Seq2Seq channel for processing long sequence inputs and an S-Seq2Seq channel for processing short sequence inputs.The encoders of both channels use CC-LSTM to fuse global and local features of power loads and generate relevant hidden vectors.In addition,an attention mechanism is introduced to enable the decoder to focus on hidden vectors at different times when predicting different time loads.Among them,the L-Seq2Seq decoder uses a CC-B attention mechanism based on cycle and time variation,while the S-Seq2Seq decoder uses a C-B attention mechanism based on time variation.Finally,the results of the fusion of the two decoders output power load prediction values.Experimental results show that the model is effective in MAE,RMSE,MAPE,R2 is significantly superior to the Seq2Seq model in four evaluation indicators,and has good power load forecasting ability.In addition,the necessity of each module of the model has been proved through ablation experiments. |