| With the development of e-commerce and artificial intelligence technology,establishing an intelligent warehousing and logistics system has become an emerging research hotspot.One of the important links is the robotic automatic bin packing task.In a typical packing scenario,the robot grasps the items to be packed in order from the conveyor belt,and then packs them into the bin to complete the customer order.In this process,the waste of space in the bin should be reduced as much as possible.To accomplish this task,the researchers formulated it as an online 3D bin packing problem and proposed corresponding bin packing optimization algorithms.Due to the lack of learning process in the early bin packing methods based on heuristic rules,the generalization of the modeled strategy is weak,and it is difficult to find the best bin packing strategy in various scenarios.Recently,the methods based on deep reinforcement learning(DRL)has made some progress on the bin packing problem.However,for the problems with large state and action spaces,the DRL network may have low training success rate and slow convergence speed during training.At the same time,the existing methods ignore a crucial human unpacking experience.When there is no suitable position to pack the current item,human can temporarily vacate more space to place the item by removing some unsuitable items from the bin.After the current item is placed properly,the removed items can be placed back into the bin.This operation is referred to as unpacking in this thesis.In the case that only one item to be packed can be observed at a time,this unpacking method is very important for the bin packing problem.In view of the above problems,this thesis proposes a method of packing and unpacking based on DRL,and designs a complete robotic packing and unpacking system.Finally,stable and reliable autonomous bin packing tasks are completed in both simulation and real scenarios.In terms of bin packing strategy,this thesis introduces an unpacking mechanism to the online 3D bin packing problem,and designs a packing and unpacking network based on DRL to learn the synergy between packing operations and unpacking operations.The network uses a dual-branch network architecture to learn the state value and execution position of packing and unpacking actions,and performs synergistic training under a DRL framework.In order to solve the problems of slow convergence speed and low training efficiency during training,this thesis proposes three types of heuristic rules for packing and unpacking,and then designs an action modulation mechanism to introduce the heuristic rules into the DRL framework to guide its training.In this thesis,comparative and ablation experiments are conducted in a simulation environment to verify the effectiveness of the proposed method in solving online 3D bin packing task.In terms of the design and implementation of the packing and unpacking system,this thesis first selects and builds the hardware platform on the basis of the proposed packing and unpacking strategy.Then the modules such as visual detection,strategy migration,data communication,and host computer monitoring are designed.Finally,a complete robotic packing and unpacking system is constructed.The experimental demonstration in the real word proves that the system can stably and reliably complete the robotic automatic packing task in the actual logistics scenarios. |