| Augmented reality technology is one of the important entrances to smart city.By calculating the pose of the target object in the real environment,and then accurately superimposing the digital city information on the specific position of the target object,Augmented Reality enhances the real-time interaction ability between people and objects,which has broad application prospects in smart factories,smart transportation,smart tourism and other fields.The object 6 degrees of freedom pose estimation is one of the core technologies of Augmented Reality.It first accurately calculates the 6 degrees of freedom pose of the target object from the input image,and then calculates the 6 degree of freedom pose of the virtual information.However,the diversity of target objects and the complexity and variability of real environment increase the difficulty of 6 degrees of freedom pose estimation.Therefore,based on the deep learning algorithm,this paper studies the low accuracy of 6 degrees of freedom pose estimation of small objects with weak textures and the poor robustness of the algorithm occlusion.The specific research content is as follows:1.An online data augmentation method using gaussian filter and gaussian noise injection improvements is proposed.Since it is difficult to label the 6 degrees of freedom pose of the target object in real images,this paper uses 6 degrees of freedom pose synthetic images of objects to expand the real images dataset.However,the synthetic images are difficult to reproduce the influence factors such as illumination changes,motion blur during real shooting,and there are also significant artifacts,which lead to low generalization of the model.Therefore,based on the FFB6 D online data augmentation method,the gaussian filter hyperparameters are tuned to fully smooth the artifacts,avoiding the model learning the target object area with the help of features such as artifacts.The gaussian noise injection operation is added to simulate the real image and enhance the filtering ability of the model to noise interference and redundant information.2.An object 6 degrees of freedom pose estimation model based on feature fusion and attention mechanism(FA6D)is proposed.The complexity and changeability of real environment and human eye’s powerful error discrimination require high estimation accuracy and strong robustness of object 6 degrees of freedom pose estimation.First of all,the Convolutional Block Attention Module is added to the first convolution module of the RGB image feature extraction network to improve the regional saliency of small objects with weak texture.Secondly,the skip connection based on Convolutional Block Attention Module was introduced into the RGB image feature extraction network based on the encoder-decoder to make up for the lack of detailed appearance features of deep posture semantic features.Thirdly,the Channel Attention Module is used to improve the Pyramid Pooling Module to enhance the connection between the visible area of the target object and the occluded area,and improve the occlusion robustness;Fourthly,the Convolutional Block Attention Module is used to reconstruct the features in the decoding stage rich in pose semantic information,so as to enhance the discrimination of similar surface features,thus reducing the interference of similar appearance objects on object 6 degrees of freedom pose estimation.Finally,the weight of semantic segmentation loss function was tuned to enhance the ability of the model to accurately identify the target object from the occlusion environment.3.A FA6 D optimization method based on multi-modal feature fusion is proposed.Multi-modal feature fusion gives full play to the complementary advantages of appearance features and geometric features,but the key problem is to develop an appropriate fusion strategy.Firstly,the redundant bidirectional fusion module in the decoder stage of the FA6 D feature extraction network was deleted to prevent the model from introducing noise through the bidirectional fusion module in the decoder stage.Then,the fusion function part from RGB images to point cloud in the bidirectional fusion module is improved by using global average pooling.On the basis of sharing appearance saliency features,the context information of appearance is shared at the same time,so as to realize the identification of the target object from both appearance and geometry perspectives.The experimental results show that the proposed object 6 degrees of freedom pose estimation model for Augmented Reality has the advantages of high accuracy of 6D pose estimation for small objects with weak texture and strong robustness to occlusion. |