| Image semantic segmentation is a highly significant subfield in the field of computer vision,aiming to assign corresponding category labels to each pixel in an image.Currently,image semantic segmentation techniques have found wide-ranging applications in various domains,including autonomous driving,medical imaging,and remote sensing agriculture.With the continuous advancement of deep learning,particularly the advent of fully convolutional neural networks,several end-to-end image semantic segmentation methods have been proposed.Although these methods have achieved commendable results,they still exhibit certain limitations.Building upon this foundation,this study presents a novel encoding-decoding network with a pyramidal representation module,referred to as EDPNet(Encoding-Decoding Network with Pyramidal Representation),specifically designed for efficient image semantic segmentation.The primary research contributions of this paper are as follows:On one hand,the encoding stage of EDPNet employs an enhanced Xception(Xception+)as the feature extraction network for capturing distinctive features from input images.These obtained feature maps are then fed into the pyramid module,which leverages multi-scale feature representation and aggregation processes to learn and optimize image features comprehensively.On the other hand,the decoding stage of EDPNet utilizes a skip connection mechanism to progressively upsample and concatenate the deep-layer feature maps,which possess rich semantic information,with the shallow-layer feature maps,which provide abundant spatial details.EDPNet fully harnesses the strengths of both encoding-decoding networks and pyramid networks,thereby endowing it with global perceptual capabilities and enabling it to effectively capture fine-grained contours of diverse geographical objects,while also reducing training time.In this paper,EDPNet is compared against various state-of-the-art models,including FCN,LRASPP,EDPNet-c6,PSPNet,Deep Labv3,U-Net,and HRNet.Experimental evaluations are conducted on four publicly available datasets,namely e TRIMS,Cityscapes,PASCAL VOC 2012,and Cam Vid.EDPNet achieves the highest accuracy rates,with m Io Us of 83.6% and 73.8% on the e TRIMS and PASCAL VOC 2012 datasets,respectively.Moreover,its accuracy on the remaining two datasets is on par with PSPNet,U-Net,Deep Labv3,and HRNet models,while exhibiting significantly reduced training time. |