Font Size: a A A

Cloud Platform Optimization Techniques Based On Scheduling And Stochastic Algorithms

Posted on:2019-01-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:F WangFull Text:PDF
GTID:1360330623463939Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The blossom of the cloud service suppliers provides the lease services of the computation power in a relative low price.Such services greatly help common customers who need the computation capabilities of massive data badly with little budget.The customers can choose the level of computation services in according to their computation targets and available budget with flexibility.Meanwhile,with the fast industrialization of the big data and AI applications,the demands for large-scale sparse dataset processing increase rapidly.This forces the cloud service suppliers to adapt their cloud servers to be capable of handling not only the traditional services such as videao streaming server,Http server,etc.,but also the computation bounding services such as neural network trainings.Obviously,from the cloud computing suppliers' points of view,how to satisfy the requriements of more customers using less investments is always the key research focus.With the assumption of fixed hardware price,making the computation system more efficient through optimized algorithms is the most rational way to achieve the goal.This thesis studies the methods that improve the customer service quality(performance stability,convergence rate)of cloud service suppliers throught two aspects,which are: system-level task scheduling algorithm and task-level optimization solver convergence acceleration.Comparing to the previous researches on maximizing of the total performance of servers,cloud servers have several special requirements that makes such goal not suitable.As stated above,cloud servers have not only the computation bounding applications running on them but also human-machine interactive applications which are very dealy/stability sensitive.Simply pursuing the highest overall IPC is very likely to make the dealy/stability sensitive applications unqualified for human user experiences.For example,humans are to be ok with low resolution videos with smooth dealy but are very likely to reject high-resolution videos with large joggle.In such case,the user experiences are very bad which is not acceptable for cloud service suppliers.On the other hand,for the computation bounded applications running on the cloud servers,we found that a large portion of them are stochastic optimization such as neural network training,regression modeling,etc.For these applications,the cloud servers still have to maintain a high performance for their convergence.Due to this concern,we solve this contradict using methods from different levels as mentioned above.In this thesis,we study and propose two algorithms that make the cloud servers have both hight performance stability for delay sensitive applications and a faster convergence rate for computation bounded stochastic optimizations.· System-level optimization In the research of guaranting the performance stability,most previous literatures focused on the resource specification or partition.Althgouth such methods help to guarantee the worst performance/performance stability,they still incur heavy overall performance degradation due to low resource utilization.Unlike these methods,we propose a novel performance-stability-oriented user application scheduling algorithm based on the cache fingure-print and the cache-contention modeling algorithm.With the optimized process scheduling algorithm,the IPC of the user applications remain stable in a relative high performance level and thus the Qo S is served with higher confidence.Since we do not partition or specify resources to specific tasks rigidly,we avoid the low efficiency problems of task partitions.· Task-level optimization With the emergence of multi-core servers and large progresses of NNs,training NNs on multi-core cloud servers becomes popular.As a consequence,asynchronus stochastic gradient descent(ASGD)that can take the advantage of the concurrency of multi-core hardwares are de facto solvers for such tasks.The acceleration techniques of SGDs have been proposed recently,however similar techniques can not be used on ASGD since it only works on datasets that are mainly large-scaled and sparse.Existing acceleration techniques fail to preserve the sparsity of the gradient which makes them extremely inefficient in these cases and thus the acceleration techniques of ASGDs are not proposed although it is badly desired.To this concern,we proposes a novel acceleration technique for ASGDs based on imortance sampling algorithm which is popularly used in optimization tasks on cloud.Importance sampling(IS)is a stochastic variance reduction technique which increases the convergence rate of stochastic gradient descent(SGD)algorithms and has been reported to be successful in SGDs recently.This thesis expand the application of IS to ASGD,which is designed to achieve accelerated convergence rate with proven convergence bound.With the systemlevel process scheduling algorithm and the process-level ASGD acceleration techniques,the efficiency of cloud computing system can be substantially increased,i.e.,with higher optimization convergence rate and higher overall system performance stability.· Using IS-ASGD in RNNs For the research of SGD/ASGD,most of the efforts were conducted on simple applications such as linear-regression.Particularly,there have been no reports on the application of IS-ASGD in the training of RNNs.The third contribution of this paper is the training accleration of RNNs using IS-ASGD.Through the detail analysis of the RNNs,we propose the theories of efficient using IS-ASGD in RNNs and demonstrate our statements according to the experimental evaluations on four most popular RNN applications.The above mentioned three novel technqiues are orthogonal and thus can be used jointly on cloud servers.The research results were accepted by SCI journals and commonly acknowledged conference on high performance computing respectively.The innovation program this thesis based on was awarded excellent project.Permitted source codes are publicly available at author's repository on github.
Keywords/Search Tags:Cache co-runner scheduling algorithm, performance stability, asynchronous SGD, importance sampling, convergence acceleration
PDF Full Text Request
Related items