| In view of the key technical requirements in the research and development of "unmanned crane handling intelligent measurement and control system" project--automatic identification and positioning measurement of loading and unloading objects,this thesis studies the instance segmentation technology of objects in complex scenes,providing key technical support for intelligent lifting and unloading.Machine vision is a feasible technology for object recognition and measurement.However,due to the complex environment and loading and unloading objects in lifting and unloading scenarios,there are a large number of uncertain factors,which make the application of this technology into a huge dilemma,the main difficulty is the realization of instance segmentation.At present,instance segmentation techniques for high accuracy are all based on supervised deep learning.According to project actual demand study target segmentation technology,deep learning instance is proposed to achieve two goals: one is the method to generate annotation data set,by marking a few samples to build large-scale,high quality tag data sets,solve the problem of lack of training data,for the use of supervised learning model provide sufficient data to support;Second,the instance segmentation method of object in complex scene can realize the accurate identification of object and provide the basis for accurate positioning and measurement.The specific research work and achievements are as follows:1.Aiming at the problem of lack of labeling data of lifting objects,a generative adversace network is designed to construct an accurate data set containing semantic labeling and key point labeling.Taking DatasetGAN as the basic network,the problems existing in practical application are improved,including:(1)Semantic feature deformation: the sample normalization layer of generator is modified to remove the mean value operation and modify the input mode of noise module and style control factor;(2)To solve the problem of weak spatial location coding ability of lifting objects with single texture feature: the constant input of generating network is replaced by Fourier feature,and a module integrating nonlinear up-down sampling is proposed;(3)Objective function: The WGAN-GP objective function is introduced to eliminate the phenomenon that the gradient is prone to disappear when distinguishing the distribution of loading and unloading target data and real data.Using deeplab-V3 as the evaluation network and DatasetGAN as the baseline,the test results show that the output mIOU value of Deeplab-V3 increases by 14.83% in semantic label generation task.In the key point label generation task,L2 loss decreased by 0.4×10-4on average,and PCK value increased by 5.06% on average.The feasibility and advancement of the improved generative adversarial network generation semantics and key point annotation data are verified.2.Research and construct a case segmentation backbone network in complex lifting and unloading scenarios.By analyzing the feature extraction backbone network of the existing example segmentation network,the parallel structure of CNN and Swin Transformer was constructed to strengthen the remote correlation coding and depth cue coding ability of the backbone network for complex lifting and loading scenarios.Then a fusion module was designed to fully fuse heterogeneous feature information.The constructed backbone network can effectively extract robust semantic and spatial feature information in complex loading and unloading scenarios.3.A case segmentation detection network in complex lifting and loading scenarios is constructed to improve the accuracy of segmentation mask.Polygons of the instance were first generated in Dense Rep Points instead of masks,and then a Transformer polygonal deformation network was applied for further refinement of the edge profile.The network predicts the displacement of each vertex for a polygon,taking into account the positions of all vertices.By deforming polygons,the model can better learn the local geometry of the captured object and has a more accurate mask.Compared with other methods,the case segmentation model,which is composed of the trunk network and detection network,achieves the optimal segmentation effect in the loading and unloading scenarios,and the AP increases by 5.64% to 99.51%.mIOU increased by 10.36% to 96.83%. |