| The emergence of cloud technology has enabled the sharing of multiple resources in the network environment,enabling efficient use and centralised management of resources.However,in recent years,benefiting from technology-driven,industry demand and the advancement of new infrastructure,cloud computing has grown rapidly and the scale of resources,users,tasks and workflows has expanded.How to effectively organise massive resources,cope with random fluctuations in demand,control costs through timely scaling while safeguarding process service quality,cloud resource scheduling and cloud service operation optimisation has become an important research area and a common concern for enterprises and It is also a series of issues of common concern to enterprises and academia.In particular,the popularity of containerized high-density hybrid clouds in recent years,the mixed deployment of various types of big data cluster services and microservices on various types of edge-end clouds,and the flexible combination of complex process applications for various user needs have become key management objects not only for intelligent operation and maintenance in the cloud,but also for core business optimization,while the trend estimation and prediction of various key indicators for future tasks in process operation is an important prerequisite for effective intelligent scheduling.The important prerequisite for effective intelligent scheduling.Therefore,this paper aims to address the problem of predicting the performance of future tasks in cloud workflows.It is found that there are three difficulties in the current academic research on the cloud workflow performance prediction problem.Firstly,it is difficult to obtain cloud workflow data for real scenarios,and most of the current research is based on simulation data.Current academic research on cloud workflow performance prediction is mostly based on individual virtual machines(VMs),instances,individual tasks or individual hosts,and does not consider the entire cloud workflow as a whole,lacking contextual information;Third,a small number of studies consider cloud workflow performance prediction as a time series prediction problem,taking into account the holistic nature of cloud workflows,however,in these studies,the sequence information collected by cloud workflows and dependencies between tasks are limited and coarse-grained,resulting in an under-utilisation of the graph structure.To achieve a certain metric prediction for a given task in a cloud workflow.Firstly,for the characteristics of cloud workflow data such as DAG graph data structure,irregular data structure and sparse task nodes,after summarising the relevant classical problems,this paper selects Graph Neural Network(GNN),a deep learning algorithm for cloud workflow performance prediction,which has natural graph structure characteristics and can make full use of the structural characteristics of cloud workflow.Secondly,considering the practical significance of the scale and real-time nature of cloud workflow scheduling in real scenarios,this paper proposes a pre-training model applicable to Graph Neural Networks(GNN),which learns the common features in cloud workflows through the pre-training model and subsequently migrates the pre-training parameters to the target dataset for fine-tuning through migration learning.This approach greatly saves computational and time costs,and through This approach offers significant savings in computational and time costs,and has been found to improve the accuracy of the prediction model through experiments.Finally,to demonstrate the relevance of this research,Ali cloud cluster log data was tracked and the relevant dataset was applied to an experimental scenario.This experiment used the mean_ca of the last task node in the cloud workflow as the prediction target,and the experimental results validated the advantages of adding pre-training to cloud workflow performance prediction.Therefore,the research in this paper has some reference value for solving the problem of how enterprises can efficiently and flexibly predict the performance of cloud workflows.In addition,to improve the model accuracy,the study proposes a data processing method for graph structure data similar to cloud workflow,inserting virtual nodes in the graph to construct heterogeneous graphs to eliminate isolated nodes in cloud workflow,and adding 13 graph-theoretic features to node features for feature supplementation and representation learning,so as to obtain more features related to the workflow structure and enable GNN to understand the graph structure more deeply.A pre-training mechanism of Random NodePiece Masking for cloud workflows is also proposed to improve the learning ability and generalisation capability of the model by allowing the node features in the workflow to be masked with a certain probability,and the model needs to recover the masked node features using the features of other nodes and the graph structure of the workflow.Through extensive experimental validation,the results show the applicability and effectiveness of the "Random node masking pre-training+heterogeneous Graph Attention Network" architecture in cloud workflow performance prediction,as well as the effectiveness of graph theoretic features and virtual nodes on graph structured data.Moreover,in the context of Industry 4.0,the prediction of future tasks in workflows can be useful for both operations and maintenance.The algorithm proposed in this paper is not only limited to the prediction of cloud workflow performance,but also provides some theoretical reference for the prediction of graph-structured data similar to the study in this paper.The study not only suggests the drawbacks of enterprises relying only on simple statistics for resource estimation when managing cloud computing effectively.Pre-training is also an innovative management model for dealing with the accumulation of large amounts of data,which can synthetically shorten prediction model iteration cycles and improve prediction performance. |