| Network virtualization is one of the key technologies expected to address the ossification of the Internet.By integrating hardware and software resources along with network functions into a software-based virtual network,network virtualization allows for effective underlying physical resource sharing among heterogeneous services,thereby enhancing resource utilization and network flexibility.Virtual network embedding(VNE)is one of the key technical issues in network virtualization as it dictates the deployment and connection of virtual networks(VNs)within 5G and beyond.An efficient VNE algorithm is imperative to ensure that VNs are embedded in a manner that satisfies the performance,security,and resource demands of both the VNs and their users.Reinforcement learning(RL)holds the potential to enhance and overcome certain limitations of traditional algorithms,such as the requirement for expert knowledge of network conditions and the challenge of dealing with nonlinear and dynamic network environments.Thus,the integration of RL with VNE can lead to more intelligent and efficient network management,thereby boosting the performance of large-scale networked systems.Based on the above research background,this dissertation focuses on the one-stage virtual network embedding algorithm based on single-agent deep reinforcement learning and the two-stage virtual network embedding algorithm based on multi-agent deep reinforcement learning.Building upon this foundation,we explore a virtual network embedding algorithm that integrates both single-agent and multi-agent approaches.The main contributions of this dissertation are summarized as follows:This dissertation proposes a coordinated virtual network embedding algorithm(Coord VNE)based on deep reinforcement learning to address the limitation of most existing approaches that primarily rely on greedy heuristics for pre-selecting node mappings and subsequently apply shortest path and multi-commodity flow algorithms for link mapping without considering the relationship between the two stages,which may restrict the solution space and potentially result in sub-optimal performance.Specifically,the two stages of the virtual network embedding problem,virtual node embedding and virtual link embedding,are jointly modeled as a graph embedding process,and generative adversarial networks(GANs)are employed to generate embedding decisions.Additionally,although traditional neural networks are widely used for unstructured data such as images and audio signals,they are not suitable for structured data such as network topologies.Therefore,Coord VNE introduces graph convolutional networks(GCNs)to extract spatial features of topologies.Simulation experiments demonstrate that Coord VNE can enhance the acceptance rate of virtual network requests and reduce resource overhead.To address the ”curse of dimensionality” problem associated with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces in single-agent RL,which may lead to significant time and resource consumption,this dissertation decomposes the virtual network embedding problem into two iterative subtasks: virtual node mapping(VNM)and virtual link mapping(VLM).To achieve coordinated decisions for the two stages to obtain optimal long-term rewards,this dissertation models each stage as a reinforcement learning agent and introduces multi-agent deep reinforcement learning algorithms to optimize agent decisions.Additionally,considering the dependency among the two stages,this dissertation further transforms the virtual network embedding problem into a sequence modeling(SM)problem to fully leverage the modeling capability of existing sequence models(such as Transformer),proposing an autoregressive multi-agent deep reinforcement learning-based virtual network embedding algorithm(ARVNE).Experimental results demonstrate that the proposed methods outperform other algorithms in terms of acceptance rate and long-term revenue.Despite the autoregressive multi-agent deep reinforcement learning algorithm explicitly considering the dependency among agents’ policies in sequential execution,i.e.,the forward process,it disregards reactions from subsequent agents during policy improvement,i.e.,the backward process.This may lead to conflicting directions in policy updates for individual agents,where their local improvements may jointly result in worse outcomes.In particular,the subsequent link mapping agent does not provide feedback on their decisions to the preceding node mapping agent.To address this issue,this dissertation proposes VNEStack,a differentiable autoregressive multi-agent deep reinforcement learning-based virtual network embedding algorithm.VNEStack explicitly accounts for the effect of the node mapping agent’s actions on the policy of the preceding link mapping agent by additionally including agent feedback passed through the decision dependency.Furthermore,dependent on such rich feedback agents can complete the causality loop:cyclic interaction between the forward and backward process.Simulation experiments demonstrate that the VNEStack can further enhance the performance of the ARVNE.The one-stage method fully incorporates coordination but faces the ”curse of dimensionality” issue,while the two-stage method significantly reduces complexity but sacrifices some performance.To effectively integrate the advantages of both methods and further coordinate the two stages of virtual network embedding,this dissertation draws inspiration from the correlated equilibrium concept in game theory and introduces strategy modification to provide a mechanism for coordinating the policies of node and link mapping agents.Specifically,this dissertation proposes a novel framework,VNEMixer,which constructs the joint policy as a nonlinear combination of node and link embedding policies.VNEMixer treats joint policy learning as a single-agent reinforcement learning problem,thus effectively integrating single-agent and multi-agent algorithms,serving as a general virtual network embedding algorithm.Simulation experiments demonstrate that VNEMixer can effectively coordinate the two stages of node embedding and link embedding,exhibiting superior performance in acceptance rate,network cost,and average network revenue. |