| Currently,there is an urgent need to develop decoration robots that can perform complex work instead of humans in the decoration market in China.One of the most important technologies for decoration robots is the autonomous navigation technology,which enables the robot to reach the destination quickly for any indoor decoration.Autonomous navigation technology based on deep reinforcement learning can be very useful for decoration robots to accomplish their navigation tasks.Based on the environmental information obtained by the robot,the technology can lead the robots to the target quickly without a map.In previous studies,there are two pressing problems in both navigation algorithms using vision sensors and LIDAR sensors : the slow training speed of navigation algorithms and the lack of generality of navigation algorithms.In order to solve the problems arising in these two navigation methods,the following works are done in this thesis.First,this thesis proposes a Transformer-based visual navigation method(TVA),which can help agent adapt to a new environment more quickly.The TVA algorithm adds two Transformer modules after the traditional convolutional neural network module on the basis of the existing tutor-student framework,so as to improve the emphasis of the intelligent body on important visual features and improve the convergence speed and adaptability of the algorithm.Meanwhile,this thesis verifies the effectiveness of the TVA in a virtual decoration interior environment built autonomously through Viz Doom.Second,this thesis proposes a LIDAR navigation algorithm based on the Mean Deep Deterministic Policy Gradient algorithm(Mean-DDPG).The Mean-DDPG network framework can alleviate the Q overestimation problem;the Gaussian behavior strategy with a decay factor can help the agent explore the environment more effectively and a multiexperience structure can help the agent learn the most recent and most important experiences.These three improved modules can effectively improve the training speed and generality of the algorithm.Furthermore,we create a virtual environment for studying indoor mobile robot navigation using Gazebo and Ros,and provide a new way to construct a reward function that contains a base reward and an additional reward based on potential energy,which can effectively help the algorithm converge.Finally,we compare the navigation performance of the Mean-DDPG algorithm and the standard DDPG algorithm and demonstrate the navigation capability of the Mean-DDPG algorithm. |