| With the rapid advancement of smart grid,power system has entered the era of "big data." The monitoring data of various power grid equipments are sent to the monitoring center,forming massive power grid stream data,which needs to be processed quickly.The traditional single machine processing method has some problems such as poor real-time response ability and low reliability in the face of massive power grid stream data.However,the widespread application of cloud computing technology in various industries,which has provided new ideas for the processing of big data in the power industry.Storm stream data processing framework in cloud computing technology has the characteristics of high throughput and strong real-time,which meets the processing of real-time and persistent monitoring data of power system.This paper studies the parallel analysis and diagnosis of power grid equipment condition monitoring data based on the big data Storm distributed computing framework.The relevance vector machine algorithm is introduced,and the parallelization implementation scheme of the relevance vector machine "one against one" algorithm based on Storm framework is designed.The scheme is divided into two modules,which are modeling and classification test.Modeling module completes the initial model establishment and incremental update,and classification test module realizes the real-time rapid diagnosis and processing of massive power grid stream data.According to Storm has the strong advantage of incremental computing and in order to response the concept drift phenomenon of power grid stream data,adding incremental learning method.Through comparison experiments,it is verified that the designed scheme has higher classification accuracy and better timeliness.The performance test of the algorithm model deployed on Storm platform verifies that Storm cluster has high throughput and low latency,and can meet the online real-time processing of massive monitoring stream data of large-scale power grid equipment.K-nearest neighbor algorithm is introduced,the parallelization K-nearest neighbor algorithm Storm-KNN(S-KNN)based on Storm framework is designed and realized to complete fast classification and diagnosis of power grid equipment massive monitoring stream data.First,the known sample set is divided into blocks randomly average,and then getting K nearest neighbor similarity of each block by parallel computing similarity between the unknown sample and each block the known sample.Finally,to achieve classification and identification of the unknown sample,gathering and getting final K nearest neighbor similarity by using parallelization method of the half-comparison.The experimental results show that the parallel KNN algorithm has better performance in cluster environment and can meet the current practical engineering needs. |