Font Size: a A A

Dataflow Runtime System On Heterogeneous Convergence Platform

Posted on:2020-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LinFull Text:PDF
GTID:1368330578481652Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As the development of semiconductor technology is getting closer to the physical limit,and new applications such as big data and artificial intelligence are emerging,microprocessor chips are increasingly developing in the direction of specialization in order to obtain better computing energy efficiency ratio.A variety of new field-specific accelerator chips emerge in an endless stream.As the variety of accelerator hardware continues to increase,high-performance computing systems have evolved from simple heterogeneous to more complex heterogeneous structures.How to integrate these het-erogeneous accelerator hardware into a unified software ecosystem,reduce their differ-ences in programming and operational efficiency,and achieve high-performance com-puting is a very challenging problem.In particular,synchronization and data movement become very expensive because of the diversity of hardware in highly heterogeneous systems.In a coarse-grained parallel computing model such as Bulk Synchronous Par-allel Model,a large number of synchronous operations are needed to coordinate calcula-tions,and it is impossible to organize efficient calculations.If a dataflow model is used to organize fine-grained parallel computing and express dependencies between tasks in a point-to-point manner,the costly global synchronization operations in heterogeneous systems can be eliminated,and performance bottlenecks due to uneven task partitioning and hardware diversity can also be minimized.However,the dataflow model still faces many problems in practical applications,including general abstraction of applications and heterogeneous platforms,system re-source allocation,efficient scheduling of tasks in new scenarios,and how to ensure the computational efficiency of each level when integrated with practical applications.The research and resolution of these problems will help us to rethink the program ex-ecution model in the era of hyper-heterogeneous computing,and provide reference for the unified programming and efficient computing of applications on large-scale com-plex heterogeneous platforms in the future.From the perspective of runtime,this paper studies some key problems of the dataflow model on heterogeneous platforms,and fo-cuses on how the dataflow program execution model can organize heterogeneous com-puting more effectively.A dataflow runtime simulator and its performance model are built through a generalized abstraction of programs and heterogeneous platforms.On this basis,a fine-grained task scheduling algorithm with higher scheduling efficiency is proposed based on the hardware and software features of the dataflow runtime sys-tern on heterogeneous platforms.In the actual system research,the dataflow runtime software system for deep learning is mainly discussed.The main research work and results of this paper mainly include the following four aspects:1.This paper proposes a more general abstract machine model and an abstract program model based on directed acyclic graph(DAG),and constructs a gen-eral dataflow runtime model on the basis of summarizing the existing dataflow program execution models.At the same time,a dataflow runtime simulator? TripletRun is designed.In the simulator,the state-of-the-art heuristic task scheduling algorithms on heterogeneous systems are implemented.And it also provides an extended interface for the implementation of new scheduling al-gorithms.Moreover,it provides a new perspective for the exploration of new dataflow models.TripletRun clearly defines the different behaviors of tasks in the process of program execution,which ensures the accurate simulation of pro-gram behaviors at the runtime level,and it also provides different metrics for program performance evaluation.2.In the dataflow runtime on heterogeneous platforms,the task scheduling prob-lem is more complex.After studying some state-of-the-art task scheduling algo-rithms on heterogeneous systems,this paper proposes a task scheduling algorithm based on the weighted out degree of task nodes which combines the characteris-tics of dataflow program execution model and heterogeneous systems:DONF.In dataflow program execution model,the number of tasks is larger and the depen-dency between tasks is more complex.The DONF scheduling algorithm uses a simpler way to calculate the task priority,which reduces the time complexity of the task selection phase and avoids traversing the DAG of the program.This en-ables DONF to support dynamic scheduling.There are great differences between different hardware in heterogeneous systems,thus communication plays a more important role in the process of program execution.DONF considers the conflict of communication links and constructs a communication model to better select processors for scheduled tasks.Compared with the state-of-the-art scheduling al-gorithms on heterogeneous systems,the scheduling length ratio(SLR)of DONF series algorithms is reduced by 34.6%to 65.8%,and the parallel efficiency is improved by 19%to 137%.3.TensorFlow is a popular deep learning software framework,which is based on the dataflow program execution model.In this paper,a dataflow deep learning framework is constructed on the Sunway TaihuLight supercomputer:swFLOW.After performance analysis and optimization,the speedup of swFLOW on single core group(CG)is 10.42.In the large-scale distributed deep learning,this paper focuses on the optimization of communication and data fetching in the runtime.The parallel efficiency of swFLOW is 81.01%for 512 processes.As one of the earliest frameworks supporting distributed deep learning on Sunway supercom-puter,swFLOW is very important for the development of deep learning software ecosystem on Sun way supercomputer and for the future hardware and software co-design for deep learning applications.4.As an attempt to combine theoretical research with practical systems,a uni-fied scheduling framework is proposed by combining TripletRun and Tensor-Flow/swFLOW.The unified scheduling framework hides the implementation de-tails of the task scheduling strategy in the actual system,facilitates the rapid im-plementation and verification of new scheduling algorithms,and also allows the use of spatial search methods to implement task scheduling or mapping.More-over,the unified scheduling framework can make TensorFlow/swFLOW compute parallelly automatically and avoid manual segmentation and repeated trial to find the best segmentation and allocation of neural networks.Thirdly,the mapping strategy determined by the unified scheduling framework can break the limita-tion that some tightly coupled operators are bound together and find the parallel strategy in a larger solution space.The preliminary experimental results show the feasibility and practicability of the unified scheduling framework.The research of this paper focuses on the dataflow runtime system on heteroge-neous platforms.Taking the task scheduling as the main line,it also focuses on com-munication and data fetching,and covers the theoretical research and practical system practice.The key problems of dataflow runtime system on heterogeneous platforms are deeply studied and discussed.The dataflow runtime system model designed in this pa-per abstracts the execution process of dataflow programs on heterogeneous convergence platform very well.The proposed task scheduling algorithm shows better performance than some state-of-the-art scheduling algorithms on heterogeneous systems.The design and implementation of the swFLOW framework provides an excellent reference for the construction of a dataflow deep learning framework on similar platforms,and also plays an important positive role in the development of deep learning software ecosystem on the Sunway system.
Keywords/Search Tags:Dataflow, Program Execution Model(PXM), Heterogeneous System, Runtime, Task Scheduling, Deep Learning, Distributed Computing
PDF Full Text Request
Related items