With the popularity of smart phones,online car rental and online car rental platforms have gradually entered the public view and quickly become a popular choice for travel.China’s online ride-hailing users have increased year by year,reaching 400 million by 2022.Therefore,how to adopt efficient vehicle scheduling strategy not only plays an important role in increasing the income of the platform and drivers,but also greatly alleviates traffic congestion,improves the public travel experience,and increases passenger comfort and satisfaction.For example,according to historical data,dispatching vehicles to popular areas with large numbers of people in advance can greatly improve the order response rate and reduce the situation that passengers wait for a long time in crowded areas and drivers in unpopular areas have no order to pick up.The research of vehicle scheduling is mainly to schedule vehicles to meet more orders through establishing multi-agent reinforcement learning model based on certain historical data,and improve the benefits of the platform and drivers.To improve the order response rate and income,the main challenges lie in: first,each driver is treated as an unrelated individual,and only relies on greed to select the nearest order when receiving the order,while missing the overall optimal combination solution in the region;Second,although the historical data is huge,it can not traverse all situations,and it is easy to generate out-of-distribution data(state-action pairing that does not appear in the historical data)during offline training.When estimating the value of these out-of-distribution data,the value function is often unable to be estimated accurately,resulting in the situation that the target value deviates from the actual value.Aiming at the two challenges faced by vehicle scheduling,this paper proposes the Shared Attention Reinforcement Learning(SARL)based on shared attention and the Uncertain Weighting Harmonic Twin-critical Network(UWTC)based on uncertainty weight.The SARL is mainly based on multi-agent reinforcement learning,which adds the variable shared attention of multiple heads of attention,lets the agents focus on each other’s position by inputting the shared vector,and considers the global optimal solution rather than the greedy suboptimal solution in vehicle scheduling.The UWTC mainly incorporates uncertainty weighting modules and harmonic twin critic network modules on the basis of the Actor-Critic algorithm,in order to better estimate the value function and achieve the goal of selecting better strategies.The innovations of this paper are as follows:(1)Propose two vehicle scheduling algorithms based on multi-agent reinforcement learning,SARL and UWTC,are proposed;(2)A shared attention module based on multi-head attention mechanism is proposed for vehicle scheduling,allowing vehicles to focus and cooperate with each other to achieve the optimal combination solution within the grid;(3)A multi-agent reinforcement learning algorithm based on uncertainty weighting module and harmonic dual Critic module is proposed for large-scale vehicle scheduling in different regions.In addition,this paper conducted experimental tests on the two proposed multi-agent reinforcement learning models based on real scenarios and real datasets.The results show that both SARL and UWTC models have achieved improvements in order response rate and total service value compared to existing mainstream models. |