Font Size: a A A

Image Recognition And Segmentation Based On Semi-Supervised And Unsupervised Deep Learning Models

Posted on:2023-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L WuFull Text:PDF
GTID:1528306905471364Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Training deep neural networks typically requires a large amount of data with labels,which is expensive in practical applications.Recent semi-supervised learning and unsupervised learning have attracted much attention by leveraging the hidden structures learned from unlabeled data to reduce or even completely remove labels.The key of these methods is how to automatically extract abstract feature representation from unlabeled data to improve the robustness of the model.At present,semi-supervised learning and unsupervised learning methods show the characteristics of low consistency,single data augmentation and unreasonable auxiliary tasks.The design of the model still needs to be further explored.The main contributions of this thesis are as follows:(1)A semi-supervised multi-matching recognition model based on mutual information maximization and data augmentation is proposed,named Multi-Match.The model includes a simple augmentation branch and a complex augmentation branch.To ensure the consistency of information,a mutual information loss is introduced in the simple augmentation branch to maximize the mutual information not only between the input and output representation,but also between the outputs.In addition,for the unreasonable regional design of the existing information dropping methods,an efficient information dropping method CutEdge is proposed and applied to the complex augmentation branch to expand the data augmentation methods.The method removes multiple regions at input edge with a certain probability,thus ensuring that part of the obj ect is removed.Finally,the model also adds a consistency loss to make the output of the complex augmentation branch close to the output of the simple augmentation branch,which further enhances the robustness of the semi-supervised image recognition model.The experimental results on CIFAR-10,CIFAR-100 and SVHN datasets with different label sizes demonstrate that the model can effectively improve the recognition accuracy and reduce the dependence on label data.(2)A semi-supervised consistency segmentation model based on perturbation consistency and mutual information regularization is proposed,named SCSeg.The key to the success of semi-supervised recognition tasks is to add perturbations to low-density regions of unlabeled data to ensure consistent output.Similar ideas can be applied to semi-supervised segmentation tasks.Previous methods have demonstrated that low-density regions of data appear in the features output by the encoder in segmentation tasks,whereas existing methods only add a single perturbation to the encoding features.The model optimizes how the perturbations are added.Different types of perturbations are added to the encoding features at each parameter update to make the decoded output of the perturbed features consistent with the decoded output of the uncorrupted features.In addition,to solve the problem that the pixellevel consistency loss cannot measure the spatial relationship between pixels,a regional mutual information loss is introduced.The model can focus on both the regional-level and pixel-level consistency by maximizing the mutual information of adjacent patches between the prediction and its pseudo-label,further improving the consistency regularization effect.The experimental results on Pascal VOC 2012 and Cityscapes datasets with different label sizes demonstrate that the model can obtain higher mean Intersection-over-Union and sharper segmentation boundaries.(3)An object location segmentation model based on a pretrained detection network and unsupervised learning is proposed,named OLSeg.Annotations for image segmentation are expensive and time-consuming.In contrast to image segmentation,the task of object detection is in general easier in terms of the acquisition of labeled training data.Therefore,combining a pretrained object-detection network and unsupervised learning for image segmentation can effectively avoid the segmentation labels.An auxiliary task based on the sparse decomposition of object instances in videos is designed to obtain the segmentation mask of the objects,which benefits from the sparsity of image instances and the inter-frame structure of videos.To improve the accuracy of identifying the "right" object,a pretrained object-detection network is used to provide the location information of the object instances.The model is trained from videos and can capture the foreground,background and segmentation mask in a single image.The performance gain benefits from the sparsity of object instances(the foreground and background)and the location information(bounding box prior),which work together to produce a comprehensive and robust visual representation for the input.The experimental results on YouTube Objects,Internet and MSRC datasets demonstrate that the model can effectively improve the segmentation performance and obtain sharper segmentation boundaries.
Keywords/Search Tags:deep learning, semi-supervised learning, unsupervised learning, mutual information, data augmentation, image recognition, image segmentation
PDF Full Text Request
Related items