| Distributed stream processing systems are widely used in medical,financial and e-commerce scenarios,because of the advantages of real-time processing.However,they still have several practical problems.On the one hand,the dynamic rate of the input stream results in the lack of resources which causes high latency and low throughput of system with fixed resources.Besides,the partitioning method used to solve the problem of skewed distribution results in a waste of resources on balanced stream.On the other hand,the partitioning method using the principle of “least allocation first” allocates data to multiple nodes through the network,which leads to the high overhead of state storage and data transmission.To solve these problems,this thesis studies the intelligent resources allocation method in distributed stream processing systems from the perspective of dynamic resource allocation and balanced partitioning,including: dynamic resource allocation with adaptive partitioning in distributed stream processing;imbalance-aware partitioning in distributed stream processing.The main work of this thesis is as follows:(1)The Adaptive Load Partitioning Scaling algorithm ALPS is proposed,which is used to solve the problems of lacking resources and skewed distribution caused by dynamic stream and the problem of wasting resources caused by partitioning on balanced stream.Firstly,ALPS collects the performance metrics of system to build the partitioning decision model,and then adaptively partition the stream according to the model.ALPS reduces the waste of resources by the means of adaptive partitioning.Secondly,ALPS uses the performance metrics to build the performance model.Based on the model,ALPS can dynamically allocate the resources of system to solve the problem of lacking resources.At last,ALPS is tested on text streams and Reddit streams with different rates and skewness.Comparing with the state-of-the-art scaling algorithm DS2,ALPS not only reduces the waste of resource,but also reduces end-to-end latency by 2orders of magnitude and improves throughput by a factor of 1.(2)The Temporal-Spatial Aware partitioning algorithm TSA is proposed,which is used to solve the problems of the high overhead caused by state storage and data transmission.TSA first identifies hot and cold data,and then uses the principle of “least allocation first” to achieve balanced partitioning.In data allocation,TSA proposes an imbalanceaware increment strategy and an imbalance-aware locality strategy.They reduce the overhead of state storage and data transmission by relaxing the imbalance constraints.At last,TSA,HASH,RR,PKG,W-C,and D-C are tested on text streams with different skewness and nodes.The results show that compared with the state-of-the-art partitioning algorithm D-C,TSA reduces the state storage overhead by 71% and the data transmission overhead by 3%.(3)Based on ALPS and TSA,the Temporal-Spatial aware & Adaptive Load Partitioning Scaling system TS-ALPS is designed and implemented,which has the characteristics of low latency,high throughput,and high resource utilization.TS-ALPS mainly designs four modules:(1)Performance collecting module,used to collect real performance data at runtime;(2)Distributed state managing module,used to manage the distributed state generated by partitioning;(3)Adaptive partitioning scaling module,achieves dynamic resource allocation;(4)Temporalspatial aware partitioning module,achieves balanced partitioning.At last,experiments use dynamic text streams to compare TS-ALPS with the stateof-the-art distributed stream processing system Flink.The results show that TS-ALPS achieves lower latency,higher throughput,and less waste of resources on dynamic streams. |