| In recent years,datacenters have witnessed rapid development.Many large enterprises,such as Amazon,Google and Microsoft,have deployed large datacenters in multiple geographical locations to provide various services to millions of users around the world.However,due to natural disasters and man-made destruction,data security has attracted more and more attention.Data redundancy is the most common and effective method to ensure data security.In order to realize data redundancy,TB to PB data needs to be copied regularly in the network between datacenters and distributed to three or more other remote datacenters.We call it disaster backup in inter-datacenter network.As the daily work of the datacenter,disaster backup occupies a large number of network bandwidth resources,which has also become the focus of datacenter research.Many researchers use multicast routing to realize the redundant backup of data and reduce the bandwidth cost of backup.However,it cannot be ignored that,in addition to daily backup,the datacenter also carries the demand of real-time interaction with users.In order to ensure that users have a good experience,avoiding link congestion and reducing interaction delay is very important.However,backup activities usually transmit a large amount of data,which can easily lead to the congestion of local links of the network,and seriously affects the interactive experience between users and the datacenter.Therefore,for the disaster backup in inter-datacenter network,it is particularly important to realize the multi-objective optimization of reducing the bandwidth cost and load balance.However,multi-objective optimization problems are usually reduced to NP-hard problems,which makes it impossible to use conventional algorithms to obtain the optimal solution in an acceptable calculation time.As an intelligent algorithm developed rapidly in recent years,reinforcement learning can obtain a better solution of NP-hard problem in an acceptable time.Therefore,we choose reinforcement learning to solve our multi-objective optimization problem.In multi-objective optimization,we usually use the method of relaxing the weights of different objectives to approximately transform the multi-objective optimization,but determining the weights of different objectives is a recognized problem,and the inappropriate weights will greatly reduce the quality of the solutions.Fortunately,using the reinforcement learning framework based on Chebyshev scalarization function,we can solve the weight selection problem in accordance with mathematical logic and obtain a better solution.The main work of this thesis is using reinforcement learning for multi-objective disaster backup in inter-datacenter network,which has the following three aspects:● Load Balanced-Multiple Steiner Trees(LB-MST).By establishing corresponding Steiner trees with minimum bandwidth cost and overall network load balance for multiple backup requirements,we abstract the above realistic scenario and model it as LB-MST problem.We use the store and forward mechanism and multicast routing to optimize the backup transmission cost and load balance at the same time.Combined with the time expend network,we propose the time-expanded version of LB-MST problem as our optimization model.At the same time,we also simplify and deform the general network topology,and propose a more universal general version of LB-MST problem.● Multicast Backup Multi-Objective Reinforcement learning(MB-MORL).In order to solve the LB-MST problem in polynomial time,we propose MB-MORL algorithm to solve the time expend version of LB-MST problem.We use multicast routing and store and forward mechanism to build multiple disaster backup multicast trees.In the time expend network,we use multi-objective reinforcement learning to find the appropriate time slots to forward data.Finally,we use the characteristics of Chebyshev scalarization function to solve the weight selection problem of bandwidth cost and load balance,and obtain the optimal hypervolume solution set.● Multicast Backup with Delay Processing Multi-Objective Reinforcement learning(MBDPMORL).MBDP-MORL algorithm is a multicast routing algorithm based on MB-MORL algorithm and using "delay processing" strategy,which can solve the general version of LB-MST problem.Compared with MB-MORL algorithm,the most remarkable feature of this algorithm is that it divides route search into two stages,which are forwarding tree construction and rate allocation.Compared with direct route search in time expend network,the computational complexity is greatly reduced.At the same time,we integrate the tree pruning optimization method into the algorithm to reduce the scale of disaster backup multicast tree,so that there can be more flexible scheduling space in rate allocation.Finally,we conduct a comprehensive performance evaluation of MB-MORL algorithm and MBDP-MORL algorithm through simulation experiments.The experimental results show that under different parameter settings,compared with the existing disaster backup solutions,MB-MORL algorithm and MBDP-MORL algorithm are better than the comparison algorithm in three aspects:total network bandwidth cost,load balance and hypervolume value.Compared with MB-MORL algorithm,the calculation time of MBDP-MORL algorithm is reduced to 1/5. |