| Recent years have witnessed the widespread use of distributed computing in the big data area.Numbers of sophisticated distributed data parallel computing frameworks,such as Hadoop MapReduce[1],Spark[2],Dryad[3],and Tez[4],have been developed and deployed to accelerate the big data processing.Most of these frameworks do the computing by transforming the application logic into Directed Acyclic Graph(DAG).In order to increase the parallelism,each computing stage is usually managed according to the Bulk-Synchronous Parallel(BSP)model during the execution of DAG.Shuffle,or the cross-network read and aggregation of partitioned data among tasks with data dependen-cies in the consecutive execution stages,usually brings in large network transfer.Due to the dependency constrains and the limited performance of disks and networks,execution of those descendant tasks could be delayed by inefficient shuffles.This delay can further slow down the whole application process.The per-formance degradation introduced by shuffle can become overwhelming in the shuffle intensive applications.Moreover,the above deficiencies of shuffle generally exist in most of the DAG data parallel computing frame-works.In this paper,we extract the common issues in current shuffle mechanism:1)The coarse granularity resource management decreases the utilization and multiplexing of hardware resources.2)The synchronized shuffle read increases the explicit network waiting time during task execution and brings a network burst which further slows down the shuffle read itself.Based on the above observations,we present S(huffle)Cache—an open source plug-in system that particularly focuses on shuffle optimization in frameworks defining jobs as DAGs.By extracting and analyzing the DAGs and shuffle dependencies prior to the actual task execution,SCache can take full advantage of the fine granularity resource management and system memory to accelerate the shuffle process.Meanwhile,SCache manages the shuffle data out of the frameworks and transfers data asynchronously,which helps overlap the network transfer time and avoid network burst.In addition,SCache provides an application-context-aware in-memory shuffle data management scheme to further accelerate the shuffle process.In order to achieve the optimizations,we make following contributions:1.Decouple the shuffles and manage them out of the DAG data parallel computing frameworks so that the shuffle data management can become more efficient.2.Implement the shuffle data pre-fetch with application context so that the network burst can be avoided and the network transfer time can be overlapped in execution phases.3.Implement the application-context-aware in-memory shuffle data management to accelerate the shuffle process.4.Design and implement the general APIs for the DAG data parallel computing frameworks so that the optimizations can be applied easily.We have implemented SCache and customized Apache Spark[2]to use it as the external shuffle service and co-scheduler.The performance of SCache is evaluated with both simulations and testbed experiments on a 50-node Amazon EC2 cluster.Those evaluations have demonstrated that,by incorporating SCache,the shuffle overhead of Spark can be reduced by nearly 89%,and the overall completion time of TPC-DS queries improves 40%on average. |