Font Size: a A A

Research On Parallel Computing Of Deep Learning Inference Service System

Posted on:2023-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:P C WeiFull Text:PDF
GTID:2558307097494794Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of the artificial intelligence,Io T and the cloud computing has given birth to numerous deep learning applications,such as voice recognition and machine translation.However,facing so many inference requests and bulky DNNs,we cannot finish the inference procedure in a low-latency manner,which has caused that DL technologies cannot be applied into latency sensitive DL applications and limited the practices of DL technologies.Thus,in this thesis,we study the key parallel computing technologies to build a low-latency inference system.In this thesis,we first study the feature selection problem in inference procedures.The emergence of computer applications often encounters huge volumes of data which need to be stored and processed in a distributed way.Thus,to accelerate the inference procedure,we should process large scale data and remove redundant features in a distributed manner.In this thesis,we propose a distributed integrated feature selection scheme(DIFS)with Subset Quality Evaluation(SQE).SQE studies the relevance between the quality of a subset and the number of features selected from this subset,which helps shorten the feature selection time efficiently.Feature selection algorithms used in our method and the evaluation metric used in SQE are integrable.Then,we give the implementation of our scheme for the Column Subset Selection(CSS)problem.More specifically,we integrate a CSS algorithm in DIFS and information entropy as the SQE metric.Theoretically,we prove that the speedup of DIFS can reach compared to the centralized algorithm in ideal situations where is the number of computational nodes,and give a well bounded approximation guarantee of the CSS solution generated by the scheme.Extensive experiments on eight data sets are used to verify the performance of scheme.Experiments results demonstrate the effectiveness of SQE and the impressive speedup DIFS can achieve.For example,performing DIFS on s EMG data set using 8 computing nodes,the speedup can be4301.Although there is a slight increase of the reconstruction error value in some situations.Additional experiments of classification tasks reveal that the performance of DIFS is better than existing state-of-the-art distributed CSS algorithms.In this thesis,then,we study how to accelerate the inference procedure when we face large scale DNNs.Large scale DNNs always mean there are lots of computational operators and numerous weights,which will bring high inference latency.Furthermore,single device may can not load DNNs when they are too big to be fitted in a single device.Thus,to decrease the inference latency and tackle with the problem of deploying large DNNs,we propose a deep learning based optimizing multi-objectives Data-Flow Graph partition method.Specifically,from the perspective of data-flow graphs,we model the DNN partition problem in order to compute the DNN on several devices simultaneously.Then,based on the model,we design a loss function which can optimize multiple objectives simultaneously including the normalized cut ratio and the balance of sub-graphs.Next,to minimize the loss function,we propose a deep learning model which contains a multi-branches feature extractor and a node placer.In experiments,we show our method can converge quickly and optimize the two metrics at the same time,and comparing state-of-the-art graph partition algorithms,our method can generate better partitions of the graph.
Keywords/Search Tags:Deep learning inference system, Low latency, Distributed feature selection schemes, Subsets quality evaluation, Data-flow partition, Multi-objectives optimization
PDF Full Text Request
Related items