High Performance Computer Job Runtime Prediction Based On Job Attribute Semantics

Posted on:2024-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:L F Zhou

Full Text:PDF

GTID:2558307073468694

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The running time prediction of jobs is a hot issue in the field of high-performance computing.Due to the increasing calculation amount and computing intensity of highperformance computing jobs,the ever-increasing demand for resources is also increasing.Supercomputers face such huge jobs and their resources.Demand and service pressure are also multiplying.In order to ensure that the entire high-performance computing production environment can be carried out in an orderly and efficient manner,job scheduling rules are extra important to ensure the fairness of jobs competing for resources.Jobs have different runtimes and resource requirements,and queued jobs will be suspended waiting for resources to be allocated to run.The backfill scheduling strategy allows to select the appropriate queued job to preempt the reserved resources and run ahead of time,without delaying the normal operation of the job at the head of the queue,which greatly improves the resource utilization of the system.However,it is difficult to estimate the real running time of a job that is not running,so how to accurately predict the running time of a job is a hot issue full of challenges.A large number of existing studies have proposed a variety of time prediction methods,but the bottleneck is that the accuracy of prediction is difficult to improve.This paper focuses on the research on improving the prediction accuracy of jobs.It can greatly increase the description of job characteristics through the attribute semantics of jobs.information,and preprocess and structure these semantic information so that each job has specific semantic features,and finally cluster similar jobs according to the semantic features of the job,and then use high-quality data combined with machine learning methods to predict The running time of the job,thereby further improving the prediction accuracy of the job.In this paper,the prediction accuracy is used as the evaluation performance index.The closer the value is to 100%,the better the effect is,and vice versa.This paper conducts targeted research on the problem of low time prediction accuracy of high-performance computing operations.The work content includes the following two aspects:1.In the traditional method of predicting the running time of jobs based on machine learning,a method of predicting time based on the fine-grained application is proposed,because there are only a few features in the prediction method of machine learning that can fully describe the variety of jobs.Dimensional operation characteristics,so it is difficult to improve the prediction accuracy after reaching a certain bottleneck.This paper proposes the time prediction framework of PREP,which clusters similar jobs into an application according to the path where the job is submitted,and predicts the time of the job in a fine-grained manner according to the application.At the same time,an application contains a lot of semantic information,such as user name,project name,instance name,parameter set,data set,etc.,which can more accurately describe the multidimensional information of a job,enhance the training effect of machine learning,and thus improve The predicted performance of a job’s runtime.The experimental results prove that the PREP prediction method has an outstanding prediction accuracy compared with other methods under the same experimental conditions,reaching 88.5%.2.Similar jobs often have similar running times.The similarity of a job is directly derived from the similarity of the job name.The job name is named by the user to display the functional information of the job.Clustering algorithms based on string edit distance have difficulty clustering similar job names together.This paper starts from the perspective of user naming jobs,summarizes the rules of user naming jobs,and designs job name clustering algorithm LSN.This algorithm divides job names into three components: letters,special characters and numbers.The information adopts different clustering methods,and then adopts the clustering method in the order of "letters-special characters-numbers" according to the importance of each component.The ultimate goal of this algorithm is to cluster similar job names together,to further improve the quality of the data,thereby improving the prediction accuracy of the job running time.The experimental results prove that,compared with the traditional string clustering algorithm,LSN can improve the quality of the data to the greatest extent,achieving the highest prediction accuracy of 79.5%.Finally,this paper proves the superiority of the design through the detailed data obtained from the experiment.

Keywords/Search Tags:

High performance computing, Backfill scheduling, Machine learning, Time prediction, Data clustering

PDF Full Text Request

Related items

1	Research And Implementation Of Job Runtime Prediction And Job Scheduling Based On High-performance Computing Job Log
2	Research On Data-driven Performance Prediction And Optimization Of HPC Programs
3	The Research On Parallel Computing Jobs Scheduling Algorithm For Backfill-Based
4	Research On Hybrid Cluster-Based High Performance Computing Job Perception And Scheduling Technology
5	High-performance Computing System Memory Subsystem Performance Prediction Model
6	A Multi DAG Scheduling Strategy Based On Backfill In Cloud Computing
7	Research And Implementation Of Spark Application Performance Prediction Model Based On Machine Learning
8	Research And Application Of Load-Prediction Scheduling On CPU-GPU Heterogeneous High Performance Computing
9	Prediction And Optimization Of Performance And Cost In Cloud Computing Virtual Machine Scheduling
10	Machine Learning Based Energy Efficiency Modeling Of Computing Nodes In High-Performance Computing Systems