| Fruit is China’s third largest planting industry,at the same time China also has the largest fruit planting area and consumer market all over the world,but China’s fruit import and export trade has a deficit,the main reason is that China’s fruit postharvest processing links,especially fruit sorting work,a large number of reliance on labor leads to high cost and low efficiency,so it is necessary to improve the intelligence level of China’s fruit industry chain,thereby promoting the increase of fruit exports to a certain extent.In this paper,the stacked fruits detection algorithm based on RGB-D visual saliency is studied,the purpose of which is to detect the most suitable fruit target for robot grabbing under the premise of low cost and small equipment,and propose a variety of algorithms to meet different accuracy and efficiency requirements,so as to promote the intelligent and efficient transformation of fruit sorting to a certain extent.The main research contents are as follows:(1)Construction of RGB-D dataset of stacked fruits.From the comprehensive consideration of image acquisition environment,cost and accuracy,the image acquisition equipment is determined,image registration and data enhancement are completed,and the dataset required for this paper is produced.In addition,by analyzing the image characteristics of the same kind of stacked fruits,the lighting conditions during image acquisition and the shortcomings of the existing algorithms,the difficulties of detection technology are determined and the technical route design is completed.(2)RGB-D visual saliency detection network based on extracting bi-directional selection dense features.This detection network is used to meet the detection needs of a relatively balanced efficiency and accuracy.Firstly,Res Ne Xt-101 is used as the backbone network for feature extraction.Then,in order to select the features that can enhance the salient regions of RGB images and the salient regions of depth images at the same time,the bi-directional selection module is introduced.In order to solve the problem of insufficient cross-modal feature extraction,resulting in redundant algorithm calculation and low accuracy,the dense extraction module is introduced.Finally,the feature aggregation module is used to cascade and fuse the dense features,and the recurrent residual refinement aggregating module is combined with depth supervision to realize the continuous optimization of the coarse salient maps.It is verified by experiments that the predicted saliency maps of this algorithm are closer to the standard ground truth maps than other comparison algorithms,and it can complete the detection of the most suitable foreground fruit target for the robot to grasp in the same stacking state.(3)RGB-D visual saliency detection network under multi-scale progressive fusion.This detection network is used to meet the needs of high-precision detection.Firstly,Res2net-101 is used as the backbone network for feature extraction.Then,in order to realize the complementary advantages between RGB features and depth features,a depth-weighted preprocessing module is introduced to purify the input RGB images and attenuate noise and other misleading features to avoid the unstable quality of the input images affecting the detection accuracy.Secondly,in order to increase the information interaction between branches of different scales and better balance the fusion features and pattern-specific features,a multiscale progressive fusion module is proposed,so that the output of the low-level module can participate in deeper processing from beginning to end,rather than simple result cascade.Finally,in order to reduce the difference of the initial saliency maps generated by different features as much as possible and improve the accuracy of the final predicted saliency maps,a hybrid supervision method is adopted for the initial saliency maps generated by multiple branches in the combined decoding stage,which accelerates the convergence speed of the algorithm and improves the accuracy of saliency inference.It is verified by experiments that the evaluation indicators of the algorithm are significantly better than other comparison algorithms,and are very close to the standard ground truth maps,and can achieve high-precision stacking fruits detection.(4)Three-level lightweight RGB-D visual saliency detection network based on crossmodal features.This detection network is used to meet the needs of high-efficient detection.Firstly,Mobile Net V3 is used as the backbone network.Then,in order to effectively distinguish different targets with similar depth values,the low-level detail features are enhanced.In order to give full play to the respective advantages of depth information and RGB information,crossmodal complementary operations are implemented for middle-level features.In order to effectively reduce the amount of parameters,a shared backbone is used in the high-level.Finally,in order to achieve sufficient decoding and prevent information loss,a three-level two-stage decoder is adopted.It is verified by experiments that the detection accuracy of this algorithm is higher than that of other comparison algorithms,and the model is smaller,which has obvious efficiency advantages and can achieve high-efficiency stacking fruits detection. |