With the advent of the era of big data,explosive data growth often brings large-scale tasks with complex dependencies.To solve the scheduling strategy and efficiency optimization problems associated with large-scale task processing,most researchers nowadays apply heterogeneous distributed computing system in cloud environment for research,and directed acyclic graph(DAG)has been widely used because it can represent the dependency between tasks.In the DAG task scheduling problem studied in this thesis,a group of distributed tasks with dependencies are allocated with appropriate computing resources in order to obtain a feasible scheduling scheme with the minimum scheduling length.The main work of this thesis is as follows:1.This thesis first constructs a single objective optimization model based on real scheduling scenarios,aiming to obtain a scheduling scheme with the shortest scheduling length.Most DAG scheduling problems are often prone to falling into local optimum.This thesis proposes a task duplication based clustering framework TDCF by combining clustering and task duplication strategies,and proposes a selection matrix SM to record candidate tasks that can be used to generate task clustering.Based on the selection matrix,multiple feasible solutions can be obtained to get the optimal scheduling solution from a global perspective.2.Obtain the inverse topology sequence of tasks based on task priority,and use task clustering algorithm to generate the initial scheduling scheme.This thesis proposes a task duplication algorithm based on idle time slice insertion,which fully utilizes the idle time fragments caused by task duplication occupying virtual machines.Combined with clustering merging algorithms,the initial scheduling scheme is further optimized and updated,improving system resource utilization and achieving better scheduling results.3.In the experiment part,this thesis generates random DAG task models to simulate different task dependencies,and sets relevant parameters based on the proposed algorithm.In order to verify the effectiveness of the algorithm,batch experiments are carried out for different parameter combinations,and two advanced algorithms HEFT and TDCA are compared.The experimental results show that the algorithm is superior to the comparison algorithms in terms of effectiveness and robustness. |