| More and more enterprises use cloud platforms to deploy services and applications,and the diverse demands for cloud computing services dramatically increase data centers’ complexity and management difficulty.Due to the complementary relationship between the resource requirements of latency-sensitive services and batch jobs,cloud computing service providers deploy mixed workloads composed of these two types of applications in one cluster to achieve resource sharing and reduce data center operation and maintenance costs.However,performance interferences(such as resource contention)are introduced in the cluster where mixed workloads run.Therefore,research on cloud computing cluster resource management methods for mixed workloads is conducted to improve application performance.Since existing resource reservation methods significantly increase the scheduling latency of latency-sensitive tasks,the buffer-based dynamic resource reservation method(DR4PB)is proposed to reserve resources for latency-sensitive tasks in a timely and ondemand manner.Firstly,in order to predict the resource requirements of latency-sensitive tasks accurately,the resource requirement prediction approach based on the time series prediction model is proposed.The analysis of cluster data shows that the number of latencysensitive task requests has the characteristics of periodic fluctuations,so a resource requirement prediction model for latency-sensitive tasks is constructed based on time-series prediction algorithms.Then,in order to reserve resources for latency-sensitive tasks on demand,the reserved resource management approach based on a dynamic buffer is proposed.The function of reserving cluster resources is realized by creating a resource buffer in the cluster,then using the predicted resource requirements to configure the buffer capacity,and the resource buffer is dynamically managed according to the usage of buffer resources to ensure that the buffer resources are sufficient.DR4 PB directly allocates buffer resources to latency-sensitive task requests to prevent them from entering the scheduling queue due to unsatisfied resource requirements,thus improving the scheduling efficiency of latencysensitive tasks.Compared with the existing state-of-the-art dynamic resource reservation method,DR4 PB reduces the queuing ratio of latency-sensitive tasks by 21.98%,shortens the average task scheduling delay by 44.98%,and increases the average utilization of cluster resources by 12.95%.Since existing resource preemption methods cause the failure of latency-sensitive task resource provisioning and seriously damage the performance of batch jobs,the task resource provisioning method based on surplus resources(TERMS)is proposed to enable surplus resources can be used by queuing tasks.The analysis of cluster data shows that many allocated resources are underutilized.Here surplus resources are defined as the resources that have been allocated to tasks but are not used.A large number of surplus resources make the cluster present high resource utilization while the actual resource utilization is very low.Firstly,in order to obtain related information of surplus resources in the cluster,the surplus resource identification approach based on allocation resource utilization is proposed.By analyzing the resource usage and resource allocation of tasks,over-provisioned tasks are screened out from batch tasks,and corresponding surplus resource information is maintained.Then,in order to make surplus resources available for latency-sensitive queuing tasks,the pre-scheduling-based surplus resource reclamation and task resource provisioning approach is proposed.Latency-sensitive queuing tasks are pre-scheduled according to task relevance and redundant resource distribution information,and task resource provisioning is completed on the assigned worker nodes by reclaiming surplus resources according to the pre-scheduling decisions.Compared with the existing state-of-the-art resource preemption method,TERMS shortens the latency-sensitive task scheduling latency by 52.35%,reduces the batch job completion time by 28.66%,and improves the average utilization of cluster resources by 15.16%.Since existing straggler task elimination methods significantly increase the resource overhead and running time of latency-sensitive tasks,ESPAC,a container-based straggler task elimination method,is proposed.Through an in-depth analysis of the causes of latencysensitive straggler tasks,ESPAC improves straggler task management efficiency with container-based elimination operations(such as the dynamic expansion of task resources,online task migration,and checkpoint-based task status preservation and resumption).Firstly,ESPAC saves task running status by running latency-sensitive tasks in containers and periodically creating checkpoints to avoid the loss of task processing progress and reduce performance penalties.Then,in order to be able to identify latency-sensitive straggler tasks,the straggler task identification approach based on running time prediction is proposed.Straggler tasks are screened out from latency-sensitive tasks by analyzing tasks’ running status monitoring data.Finally,to eliminate latency-sensitive straggler tasks flexibly and with low overhead,the straggler task elimination approach based on cause diagnosis is proposed.By diagnosing the causes of straggler tasks,the corresponding straggler task elimination strategies are formulated,and then container-based straggler task elimination operations are executed on worker nodes to bring latency-sensitive tasks back to normal running.Compared with the method of eliminating straggler tasks by creating multiple replica tasks in advance,ESPAC reduces the task resource overhead by 37.81%with only a 4.52% increase in the average task completion time and increases the cluster workload throughput rate by 39.96%. |