| With the gradual development and construction of smart grid,the degree of informatization technology in electric power system is improving rapidly,the type and scale of power data are also increasing,and the big data stream also requires higher data processing.Cloud computing and big data technology provide a technical basis for handling big data streams in smart grid.In cloud computing cluster environment,a better computing performance is provided by integrating servers distributed across different regions to form an overall system.However,to front with big data streams,especially in the sudden situation,such as alarm,abnormal data,there brings a greater processing pressure.And then,in cluster environment,data processing task’s fast distribution and scheduling could disperse cluster’s processing pressure,which can make better use of all kinds of resources of cloud computing cluster.In this paper,how to quickly and reliably process and distribute big data streams in the cloud cluster is the main research.The current status of data flow processing and processing task scheduling and distribution is studied,and the existing problems of big data stream processing at present are analyzed.There builds a hierarchical model of cloud cluster,and then the model is used to manage the cluster and balance the calculation load of each sub cluster.Then,there proposes data processing task scheduling and distribution mechanism and scheduling node selection algorithm,which can effectively utilize cluster’s basic resources and balance computing load of each node.This article set up a regional stratified model for the cloud cluster,divide the cloud cluster into multi levels and multi regions,and then,control the sub regions by the central node to form the whole area,and achieve the dual load balancing strategy of each sub region and the global region,so as to effectively integrate the local and global resources.Each area selects the control nodes in each area by election,records and maintains the resource status of each node within the area.If the node fails,there will immediately elect the new control node and pushes the node resource information to it.Based on this model,a data processing task scheduling and distribution method based on multi-queue dynamic priority scheduling algorithm(DPCS)and node selection algorithm(NRPS)is proposed.The method identifies and processes the received data through each cluster node,and then selects some data processing tasks to schedule,which according to its own load and resource state,and the target node of scheduling is selected by node selection algorithm.The scheduling of data processing tasks will be adjusted according to the difference of data content,so as to ensure the reliable processing and timely distribution of important data.Lastly,there builds experimental environment on Storm,implement the algorithm of big data streams processing tasks scheduling and distribution,and make experimental comparison and verification.In the destination scheduling node selection experiment,compared with the minimum load priority scheduling algorithm,load balancing difference is introduced to measure the load balance state between nodes.The results show that the load of nodes can be effectively dispersed by this method.Comparing with EDF,HVF and DVD algorithm in scheduling and distribution experiments,results are concluded from the experimental results that this method has a significant improvement in mission completion rate and completion time. |