Font Size: a A A

Image Semantic Segmentation Based On Lightweight Fusion And Knowledge Distillation

Posted on:2024-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:J J ChenFull Text:PDF
GTID:2568307103974519Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image semantic segmentation aims to label each pixel in a given image with a category,and output a labeled map with the same size as the original image.It can be widely used autonomous driving,drone reconnaissance,and intelligent monitoring.This thesis focuses on the research of image semantic segmentation based on different learning paradigms.For fully-supervised multi-spectral image semantic segmentation,existing methods suffer from slow inference speed and insufficient fusion of multi-spectral features.For semi-supervised image semantic segmentation,existing methods have limited ability to capture contextual information and suffer from slow inference speed.For unsupervised image semantic segmentation,existing methods lack consideration of few-sample unsupervised learning,and it is difficult to achieve satisfactory performance by directly using a small number of unlabeled samples for training.Therefore,to address these problems,this thesis focuses on the following aspects:(1)To tackle the problem of slow inference speed,unreliable spectral features and insufficient feature fusion in fully-supervised multi-spectral image semantic segmentation task,a method called Residual Spatial Fusion Network(RSFN)is proposed.Firstly,an asymmetric encoder is used to extract multi-spectral features to reduce the amount of model parameters and calculations.Secondly,a multi-spectral illumination-aware strategy is proposed to generate pseudo-labels to guide the model to generate accurate fusion confidence of multi-spectral features,thereby improving the ability of the model to discriminate the reliability of different spectral images.Thirdly,a lightweight Residual Spatial Fusion module is proposed to mine the complementary features of the two spectral features for fusion,and this fusion module can control the fusion ratio of cross-modal features with the fusion confidence generated by the model to reduce the interference of unreliable spectral features.Finally,the fusion module is expanded with multiple branches during the training phase to improve its multi-scale feature learning ability while maintaining the same model inference speed,further enhancing the segmentation accuracy of the model.(2)To tackle the problem of insufficient ability of the model to capture context information and slow inference speed in semi-supervised image semantic segmentation task,a method called Triple-View Network(Tri VN)is proposed.On the one hand,a consistency regularization training strategy for the triple-view encoder is proposed,different types of features are extracted through encoders with different structures,and the idea of knowledge distillation is used to help the model learn complementary features,thus improving the model’s ability to capture local details and global contextual information.On the other hand,a frequency division decoder is proposed,the encoded features are mapped from the spatial domain to the frequency domain,and the importance of the encoded feature maps at different levels is calculated according to the characteristics of the features in the frequency domain.The original features are then modified based on the importance coefficients to select important features and remove redundant ones,thus effectively reducing the computational and memory usage of the model and improving the inference speed.(3)To tackle the problem of weak model knowledge transfer ability in few-sample unsupervised image semantic segmentation task,a method called Two-phase Distillation Network(TDN)is proposed.The method adopts the idea of two-phrase knowledge distillation,uses the teacher assistant network as a bridge,and its model size is between the teacher network and the student network.In the first phrase,the teacher assistant network and the teacher network conduct distillation learning,and in the second phrase,the learned knowledge is transferred to the student network through distillation.At the same time,the knowledge distillation of the decoder is proposed,which fits the distribution of global features extracted by the context module through channel-level loss constraints,to improve the student network’s ability to understand the global scene.Research work(1)conducted extensive quantitative experiments,ablation experiments,and qualitative experiments on the multi-spectral image semantic segmentation datasets,including MFNet and PST900.The proposed method RSFN was evaluated based on performance metrics such as m Io U,m Acc,FPS,model parameters,and computational cost,showing excellent performance and a lightweight model size,achieving a good balance between segmentation accuracy and speed.Research work(2)conducted extensive quantitative experiments,ablation experiments,and qualitative experiments on the image semantic segmentation datasets,including Pascal VOC 2012 and Cityscapes.The proposed method Tri VN has better overall performance,achieving a good balance between segmentation accuracy and speed as evaluated based on performance metrics such as m Io U,FPS,model parameters,and computational cost.Research work(3)conducted extensive experiments on the image semantic segmentation dataset Pascal VOC 2012 and Cityscapes,and the quantitative and qualitative results showed that the proposed method TDN could achieve effective results using only a small number of unlabeled samples,partially overcoming the limitation of deep convolutional neural networks requiring a large amount of training data during the training phase.
Keywords/Search Tags:image semantic segmentation, pseudo-label generation, lightweight fusion, knowledge distillation, few-sample unsupervised learning
PDF Full Text Request
Related items