| With the continuous development of science and technology,the robotics has become a highly comprehensive cutting-edge discipline.Until now,robots have been used in many fields,such as: medical,industrial,education,etc.Especially in the industrial field,industrial robots grab industrial parts,and complete the processes of workpiece processing and parts equipment on the assembly line.In the robot workflow,the maturity of robot gripping technology is crucial.The robot's autonomous grasping is mostly controlled based on visual technology,such as visual servoing and visual positioning.However,in a complex environment,the effect of identification and positioning will be greatly reduced,thereby reducing the efficiency of crawling.As a result,providing accurate coordinates of the target object is an urgent problem to be solved now.In this paper,based on the optimized Mask R-CNN algorithm,the target object is identified from the RGB image and the mask is used to segment the target area.The algorithm combines with the Kinect v2 sensor to convert the two-dimensional coordinates to three-dimensional space coordinates and complete the positioning task of the object.The specific work of this article is as follows:(1)Target object recognition algorithm selection.It analyzes the commonly used detection algorithms of convolutional neural networks in the field of object recognition,such as YOLO,SSD,Faster R-CNN and Mask R-CNN algorithms.In order to improve the accuracy of object recognition and positioning,the Mask with the highest recognition accuracy is selected.RCNN algorithm.(2)Optimized Mask R-CNN network for object recognition and segmentation.Firstly,Kinect v2 sensor color camera has been used to collect data and pre-process the data set.Secondly,it is improved on the basis of Mask R-CNN algorithm,model preprocessing is performed on the backbone network,the parameters of the convolutional layer and BN layer are fused,and the forward inference speed of the backbone network is increased by means of pre-calculation.Then the auxiliary network and Scharr operator are combined to add edge loss to Loss.Through experiments,it is found that the improved network improves the processing of mask edge details.Subsequently,on the basis of FCN,a bottom-up connection was added,which can allow high-level features to merge with the location information of the low-level features,and improve the accuracy of target recognition.Finally,the optimized Mask R-CNN was used to perform recognition experiments under different lighting and different backgrounds.The experiments show that the algorithm which is relatively less affected by the environment will not reduce recognition accuracy due to changes in light.(3)Kinect v2 sensor performs three-dimensional space positioning.Firstly,for the noise and holes in the depth image,the picture is pre-processed by using a combination of multi-frame median filtering and bilateral filtering.The Zhang Zhengyou calibration method was used to determine the parameters of the Kinect v2 color camera,and the Kinect v2 sensor was used to perform the three-dimensional coordinate calculation experiment.Experiments show that the average relative error of positioning in the X,Y axis plane and Z axis direction is below 2%,which validates the effectiveness of Kinect v2 positioning.(4)Construction of an experimental environment for object recognition and positioning based on optimized Mask R-CNN algorithm and Kinect v2 sensor positioning model.The results obtained through experiments show that the relative error of object positioning is within 2%,and the object recognition rate is 98%,which is feasible. |