| With the continuous upgrading of intelligent terminal equipments and communication technology capabilities,the amount of application task data is also increasing(such as streaming video task,face recognition,etc.),these applications usually need computing intensive resources to achieve high-quality services.For a single user,if all the application tasks are executed by themselves,it will lead to high energy consumption and delay.This is that mobile devices can share the pressure of these computing intensive tasks by transferring them to other devices.This transferring action is called task offloading.The target of mobile device offloading is various,which can be base station,cloud and other devices.The main environment of this thesis is the mutual offloading between mobile devices and other mobile devices in device to device(D2D)communication network.The decision of task offloading is a kind of optimization of decision,which aims to reduce energy consumption and delay.The environment in this thesis is more suitable to use reinforcement learning for optimization.Reinforcement learning has been widely studied in the field of task offloading,but few in D2 D network.This thesis first analyzes the situation of D2 D environment,and proposes a D2 D task offloading model considering heterogeneity and mobility.This model includes task generation,mutual offloading,execution,mobility,etc.of each mobile device in a D2 D environment,and it can calculate its energy consumption and delay.At the same time,it introduces a discarding mechanism,which can realistically describe the structure and characteristics of D2 D network.By analyzing the model,this thesis summarizes the problems to be solved and the difficulties to be faced,regards the multi-user D2 D offloading problem as a multi-agent hybrid game problem,and puts forward a balanced space,energy consumption and time allocation method(DCDO)based on DRL.And it can support a large state space and action space,and can effectively make the agents learn at the same time and separately,so as to achieve the purpose of collaborative work.At this time,each agent is a semi joint learner,which can learn and execute independently.In this way,multi-agent decision-making in D2 D environment can be optimized.In the experimental verification stage,this thesis selects the commonly used task offloading strategy for comparative experiments,and verifies the algorithm and model proposed in this thesis.Experimental results show that the framework can meet the optimization requirements of the objective equation,and verify its effectiveness.It can achieve convergence in the experimental environment around 400 rounds.In general,it is about 40% ~ 80% more effective than the comparison algorithm.According to the set reward structure,it can save about 20% ~ 50% in energy consumption and 40% in time delay compared with the comparison algorithm,and basically Eliminate discards.At the same time,the robustness of the framework is verified by changing the parameters of the environment.Finally,this thesis analyzes the advantages and disadvantages of task offloading method based on deep reinforcement learning,and expounds the possible future work direction. |