| With the development of unmanned aerial vehicle(UAV)technology and artificial intelligence technology,UAV has shifted from single platform operation to UAV cluster operation.Emerging ideas like reconfigurable UAV,which divides multiple UAV into rigid combinations to carry out corresponding tasks,have also emerged.This concept has numerous application possibilities in the fields of logistics,transportation,rescue,and disaster relief,among others.However,there are many technical difficulties in motion control,decision making,algorithm deployment and other aspects of reconfigurable modular UAV clusters.The multi-agent reinforcement learning algorithm has good performance in solving the problem of traditional UAV cluster collaborative decision making and has good adaptability to the environment.Based on the application background of autonomous cooperative multi-objective logistics transportation of reconfigurable UAV clusters and complex environment reconfiguration crossing,this thesis mainly conducted the following research work:(1)A reconfigurable modular UAV scheme is proposed,including the UAV module structure,hardware design and distributed control system framework.This scheme solves the self-reconfiguration problem,which is difficult for traditional reconfigurable UAVs to carry out in the air.The traditional reconfigurable UAV systems use a passive docking mechanism,which takes a long time and a specific sequence to reconfigure in the air.In this thesis,the vector tilting structure and the controllable active butt mechanism are designed,and the distributed controller is introduced to greatly reduce the calculation complexity of the control distribution matrix during the reconstruction.The experimental results show that this scheme can shorten the self-reconfiguration time to less than 2 s and control the maximum amplitude of Euler angle within 6 °.(2)Two ideas are proposed to improve Multi-Agent Deep Deterministic Policy Gradient Algorithm(MADDPG)and to solve the problem that MADDPG cannot be applied to reconfigurable multi-agent systems.When the traditional multi-agent reinforcement learning algorithm is applied to the planning of a UAV cluster,each UAV is abstracted as an agent,while in the reconfigurable UAV cluster,multiple UAV individuals can be reconstituted as a new combination.Therefore,the UAV module cannot be simply abstracted as the agent in MADDPG.In this thesis,the MADDPG algorithm is improved based on two different ideas,and the improvement mechanisms such as agent mapping mechanism,leader decision allocation mechanism and embedded expert experience reward mechanism are proposed.Experiments show that the improved algorithm performs well in reconstruction and traversal,and in the multi-objective logistics transportation,the transport capacity utilization rate can be increased from 38%to 81% on the premise of ensuring the task completion rate exceeds 95%,energy consumption can be effectively reduced(3)The reinforcement learning algorithm is deployed on ZYNQ platform,indicating that the forward computation of reinforcement learning Actor network is implemented by the hardware.The experimental results show that the single decision time of the accelerator is 2.5 ms,and the execution power consumption is only 3.3 W,which meets the requirements of end-to-side deployment of reconfigurable UAV algorithm. |