Font Size: a A A

Performance Modeling Techniques For Deep Learning Applications

Posted on:2024-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:H X LiFull Text:PDF
GTID:2568306929990329Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep learning,as an important branch of artificial intelligence,has received widespread attention and application in recent years.With the increase in high-quality data,the improvement of high-performance computing capabilities,and the popularization of deep learning frameworks,deep learning technology has achieved rapid development.However,with the continuous innovation of AI/ML heterogeneous systems and deep learning application models,the co-design and performance optimization of software and hardware for AI applications and heterogeneous accelerators are becoming increasingly difficult.In order to accurately understand and guide this complex interaction between software and hardware,an urgent need exists for a holistic performance modeling method that can accurately measure the overall characteristics of applications,the complexity of the overall software stack,and the underlying physical hardware performance from the perspective of upper-layer applications,to support the efficient operation of DNN models in future systems.The main work and achievements of this thesis are summarized as follows:(1)In order to ensure modeling accuracy and measure underlying hardware performance,a machine learning approach is used to build performance prediction models for DNN operators.This method is a data-driven modeling method that learns patterns and rules from data and makes predictions through model training.First,the runtime information of DNN operators on actual heterogeneous systems is collected using profile tools to construct the operator dataset.This dataset contains underlying hardware information and maps underlying hardware performance through the operator dataset.Then,corresponding feature engineering is established for different operator characteristics,and deep learning methods are used to train the dataset to build an end-to-end performance prediction model.On the test set,the average prediction error of operator execution time by the performance model is 8.8%.(2)This study conducts performance modeling of deep learning applications based on a dataflow graph simulator.In deep learning frameworks,the application model is abstracted as a dataflow graph,where nodes represent DNN operators and edges represent tensor flow paths.The study designs and implements a dataflow graph-based heterogeneous simulator,which takes the dataflow graph as input to ensure the simulator’s generality to deep learning applications.The simulator predicts the execution time of operators in the graph based on the machine learning-based operator performance model by simulating the execution schedule of the dataflow graph,thus achieving performance modeling of deep learning applications.The simulator achieves an average prediction error of 15.4%in execution time for common models in various fields on the Huawei Ascend 910 architecture.(3)In order to explore efficient implementation of deep learning applications,this paper conducts in-depth research on task scheduling problems in data flow runtime on heterogeneous platforms.Combining the programming model of data flow runtime and the characteristics of heterogeneous systems,a Donf algorithm is implemented on the task scheduling interface provided by the simulator,whose main idea is to schedule task nodes based on weighted out-degree.Secondly,to further improve the parallelism of deep learning applications,a tensor-based partitioning strategy is implemented on the operator partitioning interface provided by the simulator.Through tensor partitioning,hardware resources are maximized and computing efficiency is improved.
Keywords/Search Tags:Performance Modeling, Dataflow Runtime, Heterogeneous Schedul-ing Algorithms, Tensor Parallelism, Heterogeneous Simulator, Machine Learning
PDF Full Text Request
Related items