| The power system goes deep into many fields of national economy,production and life,not only the close relationship between the national life and the power system,but also the normal production of the national industrial field is highly dependent on the stable operation of the power system,especially with the gradual deepening of the information and digital process,the demand and dependence of human production and life on the power system are gradually deepened.At the same time,it also leads to the investment and construction of smart devices,smart grid and smart system in the power system,which makes the power data capacity generated by the power system increase exponentially.In the era of big data,these large capacity data hide important information such as the operation status and development trend of power system.Mining these hidden information can create great value.The research on data mining method of power system is the key to obtain this value.Data mining is a kind of mathematical means of mining the value of information hiding,which aims to transform knowledge into value.Therefore,in order to make full use of power data and provide support for power system state analysis and decision-making,data mining is needed.But the traditional data mining method is based on single node serial mining.With the rapid increase of data capacity,the traditional method can not meet the needs.The emergence of cloud computing provides a new way to solve this problem.It uses distributed architecture and parallel computing mode to connect a large number of computers to realize the explosive emergence of computing power,which makes it possible to deal with massive data.Firstly,the structure and construction process of open-source cloud computing platform-Hadoop are studied.Six distributed computers are set up and the experimental Hadoop platform is built.Secondly,two kinds of common clustering algorithms,K-means algorithm and canopy algorithm,are studied,and the two algorithms are combined to complete the data clustering of power system.At the same time,aiming at the problem of bad data identification in power system,a parallel fusion algorithm is proposed according to the characteristics of traditional gap statistics method and elbow criterion method.Then,in order to better deal with the massive data,we use MapReduce model to process all algorithms in parallel,and set up an example to solve the problem using the parallel model,which has verified the feasibility and effectiveness of the parallel algorithm.The main work and contributions of this paper are summarized as follows: 1)combining cloud computing theory and technology with data mining method,it can be used in the massive data processing of power system,improve the data processing efficiency of power system,and ensure the effective extraction of information data value.The frame structure of applying cloud computing to data mining technology is studied.Firstly,the massive data of large-scale distributed power system is obtained through intelligent information collection system.Using the idea of mobile computing,the transmission loss between distributed cloud nodes is reduced.And the operation of dimensionality reduction for high-dimensional data is adopted,which can effectively reduce the calculation amount and improve the operation efficiency of the algorithm.These work lay a foundation for the further research of data mining of power system cloud computing.2)A clustering algorithm of parallel data mining based on K-means and canopy is proposed.Firstly,we study the traditional clustering algorithms: K-means and canopy,and analyze their characteristics and applicability.Then we combine them and use kmeans algorithm to iterate the clustering until the convergence is completed.For the specific operation process,two different forms of parallel algorithms are designed,but both of them run in the Map Reduce framework.Data samples of residential power consumption are collected,and the parallel clustering algorithm is used for data processing.The results verify the effectiveness of the algorithm,which can provide decision support for power dispatching.3)Aiming at the problem of bad data in power system data,an algorithm of bad data identification based on gap statistics and elbow criterion is proposed.Avoid excessive error and ensure identification accuracy.Finally,an example simulation is designed to verify the effectiveness and accuracy of the proposed algorithm in the face of massive data. |