| With the number of car ownership and drivers increase year by year,problems such as traffic safety,environmental pollution,and traffic congestion have become increasingly prominent.Intelligent vehicles provide the possibility to fundamentally solve the above problems,behavior decision-making is one of the core technologies of intelligent vehicle’s autonomous driving system.Therefore,it has very important theoretical significance and engineering value to carry out research on behavior decision-making technology of intelligent vehicle.Aiming at the low online policy learning efficiency of traditional deep reinforcement learning behavior decisionmaking methods,this paper studies hierarchical divide-and-conquer imitationreinforcement learning behavior decision-making method,which takes the motion state information of the host vehicle and surrounding vehicles,the image information from on-board visual sensor as input and the position of the moving target as the output.The main research contents are as follows:(1)Macro behavior decision-making model of intelligent vehicle is designed: The definition and role of macro behavior decision-making in behavior decision-making is explained.Design the following two macro behavior decision-making models: 1)a supervised imitation learning macro behavior decision-making model based on convolutional neural network,which can learn human-like macro behavior decisionmaking policy offline from expert driver’s demonstration data;2)a demonstration reinforcement learning model macro behavior decision-making based on DQf D,compound reward function based on driving safety field is designed for it,the model can use a small number of expert driver’s demonstration to guide online policy learning and explore optimized macro behavior decision-making policy.The independent simulation experiments of macro behavior decision-making are carried out in the Unity ML-agents virtual environment to verify the feasibility of the macro behavior decisionmaking models designed in this paper.(2)Refined behavior decision-making models of intelligent vehicle are designed:The purpose of refined behavior decision-making is explained,and two refined behavior decision-making problems of following/cruising and active lane changing is defined.A following/cruising behavior decision-making model based on deep deterministic policy gradients is designed,which accelerate policy learning by introducing deterministic policy assumption.The problem of multiple feasible lanechanging insertion points during active lane-changing is solved by introducing random policy assumption,a active lane-changing behavior decision-making model based on soft actor-critic is designed.Virtual environments are constructed based on the randomized vehicle motion model,and numerical simulation experiments are carried out to independently verify the effectiveness of the refined behavior decision models designed in this paper.Methods of combining macro/refined decision-making policy are discussed,including hierarchical reinforcement learning and imitationreinforcement learning,and the imitation-reinforcement learning method is finally selected.(3)Comprehensive simulation tests are carried out: In order to comprehensively evaluate the behavior decision-making method designed in this paper from different dimensions,comprehensive simulation tests are carried out in Carla autonomous driving virtual environment and compared with some existing typical baseline methods.The online policy learning convergence speeds of this method and reinforcement learning baseline method are tested through the policy learning speed test.Point-topoint travel test in Town01 and Town02 is carried out to evaluate the effectiveness and generalization of our method’s policy and baseline methods’ policy.Robustness tests are carried out in the Town01 of typical environments such as evening,night,and heavy fog to evaluate the policy’s robustness of the method in this paper and the baseline methods.Comprehensive simulation test results show that the policy learning convergence speed of behavioral decision method designed in this paper is significantly faster than that of the non-layered and non-divide-and-conquer reinforcement learning baseline method.The policy’s effectiveness,generalization and robustness of this paper’s method have been significantly improved compared with the baseline methods.The demonstration set expansion mechanism adopted in this paper has also been proven to be effective,which aims to enhance behavioral decisionmaking policy’s robustness. |