| The resource scheduling of a cloud cluster is the process of allocating cluster resources reasonably for application instances,mainly involving optimization techniques such as initial scheduling,rescheduling,parallel scheduling,and mixed scheduling.Scheduling has a significant impact on business performance,reliability,and resource utilization.The effectiveness of resource scheduling requires experimental verification,but conducting experiments on online clusters can easily lead to online business crashes and lack repeatability.Therefore,simulating the resource scheduling process in a cloud cluster has important practical significance.In order to solve the problem,this thesis designs and implements a large-scale cloud cluster resource scheduling simulation system.It operates based on cluster Trace data.It is able to restore and replay an online prodcution cluster.It provides users with configurable scheduling algorithm and rescheduling algorithm.Users are able to obtain the effect of their algorithm on the cluster in a repeatable manner.The main work content of this thesis includes:(1)The architecture and implementation of Lothar,a large-scale cluster resource scheduling simulation system.It runs based on Trace of production clusters.It includes event generation module,scheduling module,rescheduling module,core management module,performance evaluation and visualization module.It is able to restore and replay the continuous running of the cluster,simulate the arrival of resource requests,life cycles of instances and physical machines during runtime.By providing a scheduling algorithm and a rescheduling algorithm,you can obtain the application effect of the algorithm on the corresponding cluster.(2)Optimization of the simulation system.The system is event-driven,based on component registration and polling methods to speed up the simulation process and ensure that events occur in sequence.Realization of the time difference simulation scheme,the cost of simulation is lower and the accuracy is higher.Provides a two-stage algorithm interface for container placement and container migration.The algorithm interface is designed as a highly scalable distributed architecture.Optimization of the accuracy of migration simulation,it adds migration delay,resource limit,etc.It reproduces the conflict and failure between scheduling and migration process which widely exists in the cluster.Real-time visualization of cluster status and key performance indicators are provided.(3)Large-scale cluster scheduling experiments.In the first scheduling experiment,production cluster of Google was restored and played back,the two-level scheduling model(kube-scheduler and DCM)was compared with Borg scheduling.In the rescheduling experiment,the production cluster of Ant Group was restored and played back.The rescheduling effects of the Dot-Product algorithm and the DCM algorithm were compared and analyzed. |