As the number of motor vehicles in my country continues to increase and the application for driver’s license is allowed to reach the age of 70,the problem of driving safety in the collaborative environment of human-vehicle-road environment driving has become increasingly prominent.Existing research results have shown that the occurrence of traffic accidents is closely related to the abnormal behavior of drivers in the driving environment.Timely monitoring and analysis of driving styles and detection of road abnormalities are helpful for early detection of potential driving characteristics and become a solution to road safety problems.important research field.Through the vehicle’s assisted driving and autonomous driving intelligent decision-making technology,evaluate the driver’s subjective operation,promote the driver to improve driving behavior,effectively improve the driving efficiency under the premise of ensuring driving safety indicators,and provide an effective solution to the above driving problems way.Based on this,this paper takes the vehicle’s autonomous driving decision-making teaching strategy and the evaluation of driving behavior norms as the research objectives.On the basis of the extraction and training of the real link data of the driving agent’s Auto Navi(AMAP),the introduction of the driver Individual driving smart terminals are used to classify drivers’ driving behaviors,evaluate and display driving scores,and in view of satisfying the need to collect driving data from many drivers at the same time,use cloud computing mode to complete the matching and scoring of driving events in the cloud server;at the same time On this basis,in order to meet the needs of different evaluation scenarios,two driving action adaptation models are proposed: a typical action driving strategy teaching decision model based on deep double Q network and a continuous action driving strategy teaching optimization model based on TD3,and the constructed The validity of the model was verified.The main contents include:(1)Construct a simulation environment for real road driving rules oriented to reinforcement learning,propose route planning and traffic conditions,point of interest search,weather query,and coordinate transformation application programming interfaces based on the Auto Navi open platform,and register the interface through the artificial intelligence platform Gym The design method of the environmental agent,combined with the quantitative characteristic parameter index,provides a platform and data set construction for the research on the strategy training of the driving behavior agent in the real road simulation environment.(2)Analyzed the deficiencies of data collection in the study of driving behavior,and proposed an online collection method of Auto Navi positioning and navigation data,through designing and collecting,adopting filtering algorithm to process driving data,installing and running application programs on personal intelligent terminals,and on the cloud Complete the matching,storage,and processing methods of driving events.(3)Under five typical discrete driving actions,a strategy empirical method considering the information vector representation of Auto Navi driving data urban roads and the priority driving experience playback mechanism is introduced,and the target network is frozen,and the environmental state and actions are analyzed through the deep double Q network.Design and construct day and night generalized driving scenarios,realize the learning of optimal driving behavior strategies under the combination of multirisk dimension interactive data,and improve the robustness of task algorithms.(4)In the actual driving task process,the action of the task is continuous,complex or has a large number of states,and the number of fine discrete actions approaching continuous actions will increase the training cost.A continuous action strategy teaching optimization algorithm model based on the artificial potential field theory is proposed to explore the reward cost.Based on this,the online driving data is reasonably introduced,and the similarity of the two sequences is compared in the time dimension through the FDTW rapid evaluation algorithm,and realized Reasonable scoring mechanism,and then build a complete intelligent driving behavior strategy teaching and user driving behavior evaluation. |