| With the continuous expansion of the robot application field,in many application scenarios,robots need to move in unknown environments and perceive their surroundings.The required information mainly includes two aspects: the robot’s position and observations of the surrounding environment.This synchronized localization and mapping technology is known as Simultaneous Localization and Mapping(SLAM).Among them,visual SLAM technology has become an important research direction.Visual SLAM offers lower cost,higher accuracy,and independence from specific sensors,making it widely applicable in fields such as autonomous driving,robot patrolling,and robot rescue.However,currently available mature open-source SLAM systems are primarily designed for static environments.As a result,in environments with dynamic objects,particularly those with prominent textures and occupying significant image areas due to motion,the results of localization and mapping often lack the conditions required for practical applications.Furthermore,although visual SLAM technology has been extensively researched and applied,it fails to provide the necessary information for advanced human-robot interaction.Consequently,the integration of visual SLAM with deep learning has emerged as a solution to address these two pressing issues.Therefore,it is of significant practical importance to further explore visual SLAM frameworks and algorithms that are more robust,adaptable,and practical for dynamic environments.In view of the above,this paper focuses on indoor environments with dynamic objects and conducts research on visual SLAM algorithms using RGBD cameras as input devices.Firstly,to deal with the presence of dynamic objects in real environments,a lightweight neural network,which can be applied to the SLAM system,is designed for object detection.The goal is to maximize detection accuracy while ensuring a certain operational speed.Secondly,a selection process is designed to distinguish between dynamic and static objects based on object detection results.Finally,object detection and semantic segmentation techniques are utilized to add semantic information to the SLAM map.According to the above analysis,this paper carries out research from the following aspects:1.In response to the real-time requirements of SLAM technology,a lightweight neural network model combining the YOLOv5 neural network structure and the Rep VGG neural network is utilized,which can achieve a fast target detection process.2.For the inaccurate situation in the dynamic and static point judgment problem,a dynamic object or dynamic feature point removal algorithm combining optical flow method and GMS algorithm is proposed,effectively improving the accuracy of feature matching and laying a foundation for accurate mapping.3.In response to the problem that SLAM cannot provide useful information for higher-level human-computer interaction,the extraction of semantic information of feature points within the target detection range is added to the mapping thread,the annotation of the SLAM map is completed,and a semantic segmentation module is integrated after the completion of real-time keyframe selection to display real-time semantic information.4.Experimental verification is conducted for the above designs.Firstly,the application accuracy of the designed target detection network and semantic segmentation network is tested,and the training accuracy and detection results on the corresponding dataset are analyzed,confirming that the accuracy can meet the real-time application requirements;secondly,the TUM dynamic dataset is used as the data source to test the recognition of dynamic feature points,proving that the combination of optical flow method and GMS algorithm can effectively identify dynamic feature points;finally,the TUM dynamic dataset is used to perform pose estimation of moving objects in dynamic environments and verify real-time semantic threads,proving that the actual application can achieve the expected results. |