| Deep Reinforcement Learning(DRL)is an automatic policy learning algorithm based on deep learning and a small amount of reward signals.Due to the huge demand of data for deep learning and the sparsity of reward signals,the amount of training data is increased to achieve DRL task objectives.Under the limitation of the interaction in real world,it is important to study Efficient DRL algorithms.Specifically,Efficient DRL methods train a policy with comparable performance by using as few interaction samples as possible.This research is divided into two aspects of data augmentation and training framework.Among them,data augmentation aims to construct new samples based on existing samples and expand the data set to improve the data efficiency over the original data.However,previous methods originated from the traditional image classification task and did not consider the semantic consistency of image states and actions in decision-making scenarios.In this research,Efficient DRL with Symmetric Consistency is proposed,which can realize correct augmentation by constructing semantically consistent symmetric transition samples.To fully combine the symmetric characteristics,the Symmetric Deep Q Network(Sym.DQN)is proposed and carries out the joint optimization on the original and the augmented sample,improving the data efficiency.Meanwhile,in the training framework aspect,previous algorithms adopt fixed updateinteraction ratios,resulting in under-training or over-fitting in different stages and tasks and limiting the performance.To solve this problem,the difficulty of fitting samples is measured according to the local loss standard deviation,and the dynamic threshold is constructed by the exponential moving average of the standard deviation.Based on the above settings,Efficient DRL with Flexible Update is proposed,which can adjust the update-interaction ratio according to the complexity of training samples to alleviate under-training and over-fitting,and reduce computational costs while ensuring the performance.The experiments are conducted in the Arcade learning environment with reference to the Atari 100 K video game benchmark.Experiments show that Symmetric Deep Q Network is superior to previous models,and Flexible Update Mechanism reduces training costs in different tasks and improves the overall data efficiency. |