| In recent years,the field of computer vision has seen a surge in research activity,with many researchers dedicating themselves to exploring the possibilities of deep learning.Specifically,there has been a growing interest in image classification,semantic segmentation,object detection,and instance segmentation.While the latter shares similarities with semantic segmentation in predicting object classes and masks,it also involves distinguishing between different instances.This method is capable of detecting and identifying various object instances in a variety of images.However,in order to perform fully supervised instance segmentation,datasets must be labeled at the pixel level.This creates a significant demand for fully supervised instance segmentation datasets,which can be extremely time-consuming and labor-intensive to create.Therefore,in recent years,a semi-supervised or weakly-supervised method has been proposed to train instance segmentation.To mitigate the issue of costly datasets,weaker but more affordable forms of supervision are being explored.Weakly-supervised instance segmentation,which is typically based on image-level annotation rather than pixel-level labeling,is one such approach.While these forms of supervision may be less precise than fully-supervised methods,they can still be highly effective and offer significant advantages in terms of cost and efficiency.However,because the information of image level annotation cannot provide instance specific semantic information,more complex processing algorithms are needed to obtain region location.In order to explore the difficulties of weakly-supervised instance segmentation,this thesis studies the weaklysupervised instance segmentation algorithm and improves the model,which improves the segmentation effect of weakly-supervised instance segmentation based on image level annotation.The following is the research content of this thesis.We propose a self-supervised attention transfer mechanism(SATM)to retrain the classification network for capturing hidden activation of objects.First,we adopt image augmentation techniques to adjust the contrast of the original image and input them into the classification network.Then,we merge the output CAMs and compare it with the original image CAM,in order to transfer the attention of the classifier.Finally,we define the Similarity Loss,which is utilized to retrain the classification network,and output the final CAM.SATM enables the classifier to not only attend to the salient regions of an object but also to capture information from the suppressed activation regions.In addition,we explore a new module,pixel relevance focused-unfocused,to better integrate pixel context information.For pixel relevance focused,we add attention mechanism to focus on extracting pixel relationships,so that the pixel will pay more attention to itself,to reveal fine pixel-wise information.To achieve pixel relevance that is not limited to a narrow focus,we utilize multi-scale atrous convolutions to expand the receptive field.Then,three loss function are designed to optimize the training direction.The loss function includes a classification loss function,a similarity loss function and a background similarity loss function.Three types of loss function are used to add supervisory constraints in different directions.Finally,to assess the performance of each module,this thesis conducts experimental research on the PASCAL VOC 2012 dataset through sample classification and labeling,shows the segmentation results in its weakly-supervision instance segmentation,and compares it with some existing weakly-supervision segmentation.From the comparative experiment,it is evident from the results that the model presented in this thesis surpasses other methods in both the instance segmentation m AP and semantic segmentation mIoU metrics. |