| With the development of new network services,their unpredictable and more dynamic characteristics put forward more demanding requirements on the network delay and bandwidth,while the traditional routing algorithm can only route the output for a single optimization condition in response to the development requirements of new network services,ignoring the overall network performance,which cannot adapt to the complex,changing and highly dynamic network environment.For the problem of resource allocation in optical networks,traditional resource allocation algorithms are split into routing subproblems and wavelength allocation subproblems,which are computed separately and are very complex,so a simple and simultaneous method that can solve its resource allocation and routing problems is necessary.Reinforcement learning uses unlabeled data to determine how close an intelligence is to the correct answer through feedback from a reward function,and is a method for finding the optimal decision through trial and error in the environment.Compared with other machine learning algorithms,reinforcement learning is more suitable for solving complex decision-making problems in practice.This thesis introduces reinforcement learning algorithms into cross-layer resource allocation and routing planning for optical networks,and investigates cross-layer resource allocation and routing algorithms for metropolitan networks.The main work and innovations of the thesis are as follows:A Q-Learning based routing and cross-layer resource allocation algorithm for Optical Transport Networks(OTN)is proposed,which gives different rewards for metrics such as overhead,resource occupancy,node importance,and available wavelength resources by introducing a path evaluation mechanism,and the optical network RWA(Routing and Wavelength Assignment)in the optical network from a complex problem of path selection and resource allocation to a single problem of reward function setting,and solves the complex RWA problem by selecting the corresponding path with the largest cumulative reward value as the optimal routing decision,considering the OTN three-layer network structure and setting the corresponding service validation scenario for each layer.The network topology of pan-European cost239 is used,and the simulation is carried out according to the current OTN characteristics of randomly set link overhead,bandwidth and wavelength resource occupation.The results show that the algorithm is compared with the traditional Dijkstra algorithm and KSP algorithm combined with First-Fit algorithm,and the algorithm calculates the same results as Dijkstra algorithm under the must-go-andavoid node scenario as the shortest In the load balancing scenario and wavelength consistency scenario,the algorithm output can meet the service transmission requirements,and there is no failure of the first resource allocation by Dijkstra and KSP algorithms,and it can output the link and resource allocation scheme to meet the requirements at one time;in the link separation scenario,the algorithm can output the route selection result with complete separation of nodes.For the problem that Q-table storage and table lookup are difficult when the network topology is more complex and the number of nodes is large,a Deep-Q-Network-based algorithm for optical transport network routing and cross-layer resource allocation is proposed,which replaces the Q-table by a neural network,and the output of the neural network is used instead of the query Q table and introduces an experience replay mechanism to reduce the relevance of past experience and thus improve the utilization of historical data.The results show that in the scenario of must-avoid nodes,the algorithm can output the links that meet the requirements of must-avoid nodes without network pruning operation;in the scenario of load balancing,the output of the algorithm can meet the load balancing requirements better than the traditional algorithm.In the wavelength consistency scenario,the traditional Dijkstra algorithm and KSP algorithm with First-Fit algorithm fail to allocate wavelengths,while this algorithm can output the wavelength consistency selection results of the whole link for the first time;in the link separation scenario,this algorithm can output a link that meets the requirements of link separation scenario. |