Font Size: a A A

The Research On Balanced Training For End-to-end Self-driving Decision And Control

Posted on:2022-01-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:W YuanFull Text:PDF
GTID:1522306836492284Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The self-driving vehicle is a critical strategic direction for the automobile industry in the global range.It is also one of the core techniques for China to construct a powerful transportation system.The current popular self-driving system clearly divides the task boundaries of the perception,localization,decision-making,planning,control,and other links,which reduces the coupling between different links in structure.It is difficult for the system to respond correctly in time before a large number of traffic participants with dynamics and uncertain intentions,and it is also difficult to fundamentally guarantee the safety of the system by optimizing only partial links.Therefore,the existing architecture has become one of the main bottlenecks for the large-scale application of self-driving vehicles.In recent years,more and more researchers have turned their attention to the self-driving system architecture based on end-to-end learning.The end-to-end learning based architecture directly maps the raw data of the sensor to the decisionmaking and control of the vehicle through a neural network.It protects the coupling between modules and allows data to establish implicit associations on its own,which not only weakens the influence of errors in local tasks on decision-making and control results,but also reduces the computational redundancy caused by the excessive division of tasks.However,the end-toend learning based driving has serious imbalanced model training problems because of the imbalanced distribution of the learning value or the tag from the training datasets.Therefore,this article focuses on the imbalance training problem in end-to-end learning.Specifically,aiming at the imbalanced sampling problem in the decision-making model training,this article studies the balanced training based on the front-balanced sampling method;Aiming at the imbalanced distribution problem of datasets in the fields of steering estimation and virtual to real driving model transferring,this article studies the balanced training based on the back-gradient balanced method.The main research results are as follows:(1)Aiming at the problem of imbalanced memory sampling in decisionmaking model training,this article proposes a balanced training method based on multi-reward and priority sampling mechanisms.The deep reinforcement learning model can effectively search for the optimal vehicle decision result,especially the classical method of deep Q learning.However,such methods face the problem of imbalanced value of learning data.The model samples a large number of data with low learning value during training,and it is difficult for the model to learn enough high-value data.Based on the priority sampling mechanism,this article designs a balanced training method to encode and sample the high-value and low-value data with the different priority.At the same time,the function approximator is optimized based on the decomposition of multi-reward functions,which maximizes the effect of high-value samples and improves the model training effect.Based on the training and testing experiments in a highway simulation platform,this article demonstrates the proposed method improves the decision-making performance on speed up,speed down,turning to the left lane,and turning to the right lane.(2)Aiming at the problem of imbalanced distribution of the training dataset for the steering estimation model,this article proposes a costsensitive balanced loss function based on a three-factor model.During the normal driving,the driving behavior of large steering is less,and the driving behavior in the straight road is far more than the driving behavior in the curve.Therefore,the driving dataset shows a very imbalanced phenomenon on the distribution of the steering.Specifically,the histogram distribution of the steering is low on both sides and high in the middle area.The model trained with this type of dataset has good steering estimation on straights and poor estimation on corners.The three-factor model designed in this article directly acts on the loss function.By adjusting the three-factor parameter,the loss contribution of the steering data with less distribution is amplified,while the loss contribution of the steering data with more distribution is relatively maintained.Therefore,the model training is balanced.The final experiments demonstrate the proposed method improves the estimation accuracy of the end-to-end steering estimation on different datasets and models.(3)Aiming at the problem of manual tuning parameter in the threefactor model,this article proposes a cost-sensitive balanced loss function for adaptive gradient hedging,which optimizes the balanced training of the end-to-end steering estimation model further.In the three-factor model,it is necessary to manually adjust the three-parameter factors,which reduces the universality of the method.In order to reduce the difficulty of finetuning parameters,this article further analyzes the distribution law of the dataset and designs an adaptive gradient hedging factor based on the distribution law.Therefore,the adaptive balanced training cost-sensitive loss function can be constructed.Finally,the experiments illustrate the proposed method improves the estimation accuracy of the steering estimation model further.(4)Aiming at the problem of imbalanced distribution of the training dataset in the virtual-real transferring task of the end-to-end driving model,this article proposes a cost-sensitive balanced adversarial learning model,which realizes transferring the driving model trained in the simulator to the real environment with balanced training.Traditional domain transferring methods require intermediate training and supervision labels,or two-stage training.This article proposes a one-stage training framework that can directly train a driving domain transferring model.Meanwhile,this article introduces the aforementioned cost-sensitive loss function for the imbalanced problem of driving datasets,which is able to hedge the gradient imbalanced phenomenon.The experiments demonstrate that the proposed method realizes transferring the virtual driving model to the real environment under balanced training conditions.To sum up,this paper establishes balanced training methods commonly used in the field of end-to-end learning for self-driving decision-making and control.The methods focus on the pre-sampling balanced method and the back-gradient balanced method.And this article demonstrates and applies the methods in decision-making,steering estimation,and virtual-real driving models transferring for the self-driving vehicle.
Keywords/Search Tags:Decision-making, control, balanced training, end-to-end learning, reinforcement learning, adversarial learning
PDF Full Text Request
Related items