| The fifth generation(5G)technology is expected to support a rapid increase in infrastructure and mobile user subscriptions with an increase in the number of remote radio heads(RRHs)per unit area using Cloud Radio Access Network(C-RAN).From the economic point of view,minimizing the amount of energy consumption of the RRHs is a challenging issue.From the environmental point of view,achieving “greenness” in wireless networks is one of the many goals of telecommunication operators.Firstly,the existing schemes based on deep reinforcement learning(DRL)do not consider significant user-BS characteristics.In this case,users are required to send their information to RRHs for signaling which increases the signaling overhead,and performance is limited because the UE's movement cannot be captured in the mobility scenario.Secondly,the methods based on reinforcement learning need to carefully define a reward function.When the scenario changes,the parameters of reward also need to be manually adjusted,which makes algorithm not flexible enough.Lastly,the algorithm to optimize energy consumption based on reinforcement learning directly formulates the energy consumption problem as a Markov decision process(MDP)to achieve End-toEnd output.However,the previous learning results will fail in new scenario and retraining in the new scenario is costly.Therefore,we focus on the research of energy consumption optimization system that can dynamically activate or deactivate a cell according to traffic conditions,aiming to minimize the energy consumption of the entire system on the premise of ensuring the quality of service(QoS)requirements of users.Firstly,we proposes two revised algorithms based on deep Q-network(DQN)and dueling deep Q-network(dueling DQN)respectively,named relational DQN and relational dueling DQN,which using the original relational matrix between RRH and users as input,and Convolutional Neural Networks(CNN)is used to extract feature.Then,in order to make our scheme more flexible for different scenarios,an adaptive reward is formulated,which can automatically adjust the parameters in reward function to balance energy consumption and QoS requirement of users during the learning process.Lastly,we further propose an energy consumption optimization strategy based on DRL and transfer learning.Extensive simulations reveal that the proposed scheme based on CNN and DRL has better performance in balancing the QoS requirement of users and system energy consumption.And the adaptive reward proposed in this paper can balance the QoS requirement of users and system energy consumption by automatically adjusting the parameters of reward function.Moreover,compared with the retraining in the new scenario,the scheme combining DRL and transfer learning speeds up learning. |