Font Size: a A A

Research And Application Of Energy Efficiency Model And Task Scheduling Based On Heterogeneous Spark Cluster

Posted on:2022-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WeiFull Text:PDF
GTID:2518306575966349Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of cloud computing,the Spark computing framework has become the mainstream framework in the current market due to its excellent performance.Therefore,research on Spark is particularly important.With the rapid and iterative hardware technology,cluster hardware heterogeneity will become an unavoidable phenomenon,and applications with different characteristics have different requirements for CPU and other hardware performance.These two differences will affect task scheduling,which is inappropriate Scheduling cannot make full use of node resources,which means that there will be energy efficiency issues and the quality of service cannot be guaranteed.In order to solve the above problems,this thesis proposes the TPCBFD and the TPCGM.Experiments have proved that the method proposed in this thesis is effective.The main work of this thesis is as follows.1.Build the basic platform for TPCBFD and TPCGM.The work included modifying part of the Spark architecture source code,collecting information about the running state of the application through information collection scripts,and completing data processing with nodes,application evaluation module,energy consumption processing module,and energy efficiency data processing module.2.To optimize resource matching between heterogeneous environments and different types of applications,a scheduling method of TPCBFD based on heterogeneous Spark clustering is proposed.The scheduling algorithm in the current Spark computing framework tiles the tasks among the idle computing nodes of the cluster to maximize the utilization of cluster resources,but this leads to the problem that the node performance is not matched with the performance that the node task is focused on at runtime.If the performance required by the application runtime focus can be matched with the advantage performance of heterogeneous nodes,then the node ownership performance and task focus performance can be complementary.To sum up,TPCBFD based on heterogeneous node clustering is proposed in this thesis.Experimental results show that the proposed scheduling algorithm can achieve an optimization effect of 20.7792% compared with the original algorithm.3.In order to optimize the energy efficiency and SLA in Spark native scheduling policy,a TPCGM gain energy efficiency model based on clustering performance was proposed.Spark’s native scheduling algorithm uses undifferentiated random scheduling of tasks,which cannot guarantee energy efficiency and SLA.To sum up,this thesis proposes a TPCGM to guide task scheduling.Experimental results show that using TPCGM for Spark native scheduling algorithm can effectively improve SLA,and the program running speed can be increased by 21.3503%.
Keywords/Search Tags:cloud computing, heterogeneity, spark, task scheduling, efficiency models
PDF Full Text Request
Related items