Font Size: a A A

Research On Key Methods Of Parallel Machine Learning Model Training

Posted on:2023-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:L GuanFull Text:PDF
GTID:1528307169977159Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Artificial intelligence technology generally requires to use machine learning models to process a large number of training data.Due to the limits of memory and computing power,it always faces significant challenges when training machine learning(especially deep learning)models on a single computing device.Therefore,exploring effective methods to utilize multiple computing devices to train the machine learning models in a parallel manner has both academic significance and practical value.This thesis conducts an indepth study on the parallel training of machine learning models,exploring the parallel training of non-convex penalized support vector machine and the deep neural network models.The challenges addressed by this thesis mainly consist of three aspects.First,in machine learning,the regularized sparse models play a significant role in processing high-dimensional data.When combining SVM with the sparsity-inducing non-convex penalties,the SVM model supports both classification and variable selection functions.However,the non-convex regularized SVM is unable to be directly solved by applying the popular optimization methods because of the non-differentiable trait as well as the non-convex and non-smooth traits of the objective function.Second,the popular DNN training methods mainly consist of SGD and its variants.However,these gradient-based optimization methods generally suffer from gradient vanishing and poor conditioning problems.Recent years,ADMM has led a new direction to train DNN models in a nongradient way.However,the existing ADMM-based approaches can not achieve a good trade-off between convergence and training speed,nor do they support parallel training on multi-GPU computing systems.Third,the pipeline parallelism has been recognized as a mainstream approach to efficiently training large-scale DNN models on the multiGPU computing platform.The two most representative pipeline approaches are GPipe and Pipe Dream.GPipe is a synchronous pipeline parallelism approach,while Pipe Dream belongs to the asynchronous pipeline approach.Nevertheless,plenty of “bubble” in the pipeline structure of GPipe lead to low GPU utilization,which thus makes GPipe unable to generate high enough throughput.Pipe Dream uses “1F1B” pipeline structure which guarantees high GPU utilization and throughput,yet the weight stashing technique used by Pipe Dream results in the remarkable and unbalanced overhead of GPU storage.Furthermore,the unsolved staleness problem that happened in the asynchronous weight update affects the convergence trait of Pipe Dream.To address these three challenges in the parallel training of machine learning models,this thesis focuses on multi-core clusters and multi-GPU computing systems and proposes efficient solutions respectively.First,to effectively solve non-convex penalized SVM,this thesis proposes a fast and efficient ADMM-based approach,FEADMM,which can,in a parallel way,solve linear SVM combined with six different non-convex penalties effectively.The experimental results on the LIBSVM benchmark dataset illustrate that FEADMM converges fast and obtains better accuracy than the GIST approach.The experimental results on a multicore cluster also demonstrate the good scalability of FEADMM and the suitability for parallel processing of high-dimension data.Furthermore,this thesis presents the proof which demonstrates the convergence of the proposed algorithm when solving non-convex penalized SVM problem.Second,the prior ADMM-based approaches are unable to efficiently train DNN models on the multi-GPU computing system.To address this problem,this thesis proposes an ADMM-based data-parallel approach pdl ADMM.Our approach first constructs an optimization problem to represent the parallel training of a fully connected network using N GPUs.Then,the optimization problem was reasonably transformed into a constrained optimization problem which can be solved by directly applying the ADMM framework.By using ADMM,the parallel training of a fully connected neural network can be achieved by parallelly and iteratively solving four sub-problems.For each sub-problem,pdl ADMM comprehensively considers the computation complexity,convergence rate,and suitability for parallelism.The experimental results show that the proposed ADMM-based training method converges quickly,iterates fast,and demonstrates good scalability on the multiGPU computing platform.Third,to overcome the shortcomings of GPipe and Pipe Dream,this thesis proposes a weight-prediction-based pipeline parallelism approach XPipe.XPipe adopts the Microbatch “1F1B” pipeline structure,which has two remarkable features:(1)the pipeline structure generates little “bubble”?(2)the pipeline structure is quite stable,which does not change with the number of Micro-batches included in each Mini-batch.These two features make XPipe always generate pretty high GPU utilization.More importantly,to simultaneously address the weight inconsistency and staleness issues in the asynchronous weight update,XPipe uses a novel and efficient weight prediction approach,Brob.Experimental results on machine translation and image classification tasks show that the training efficiency of XPipe is observably better than both GPipe and Pipe Dream.For instance,when training GNMT-8 on WMT-16,the training efficiency of GPipe is 2.9x and 2.0x more than that of GPipe and Pipe Dream,respectively.
Keywords/Search Tags:Machine Learning, Deep Neural Network, Parallel Training, Nonconvex Penalties, GPU, ADMM, Data Parllelism, Pipeline Parallelism
PDF Full Text Request
Related items