| Semantic segmentation plays an important role in map navigation,unmanned driving and robot systems.Due to the rapidly growing demand for real-time use of semantic segmentation in such practical applications,the enthusiasm for designing lower latency and more efficient segmentation networks has been stimulated.However,some current semantic segmentation algorithms have some problems,such as insufficient use of multi-scale information,loss of detail features,and large difference in resolution between high level context information and low level detail.To this end,this paper carries out research based on codec structure,and its main work is as follows:(1)In order to solve the problems of discontinuous image segmentation and fuzzy context information of deep features in semantic segmentation,an implicit feature alignment module is proposed for decoding by combining the implicit alignment function(IFA)with the semantic decoding model based on Transformer.This module can effectively replace bilinear upsampling and convolution to align multi-layer features.Learn accurate and continuous semantic information;In the loss processing part,a detail guidance module is constructed to guide the shallow layer to encode the spatial information,which improves the learning ability of the model for contextual semantic information and the segmentation performance of images with different resolutions without increasing the calculation cost in the reasoning stage.On the public data set ADE20 K,the proposed model improves the m Io U value by 1.11% compared with the original model.Experiments show that IFA allows low level spatial details and high level semantic information to be aligned accurately,resulting in more accurate prediction results.(2)In order to solve the problem that multi-scale information and boundary detail features can not be fully utilized in decoding part of existing image semantic segmentation algorithm,this paper combines spatial pyramid pool framework with Transformer to design an image semantic segmentation model which can enhance multi-scale feature fusion in decoding stage.In the coding phase,the model obtains features of different scales through multiple parallel branches.At the decoding end,firstly,a bottleneck space pyramid pool module is designed to enhance the feature fusion effectively.Secondly,the bottleneck space pyramid module is combined with the Laplacian pyramid method.Finally,the resolution of different feature maps is restored by concatenation operation,so as to retain more detailed features.In the open data set ADE20 K,compared with the original network,the m Io U index of the improved model is increased by 1.36%,and the floating point computation amount is only 51% of the original network,which proves its effectiveness in improving accuracy and reducing computation amount. |