Image semantic segmentation is a basic technology in machine vision.Its goal is to classify every pixel in an image.It is widely used in agricultural automation,intelligent quality inspection,medical monitoring and other fields.With the development of technology,image acquisition is convenient and fast at this stage,image data is increasing,computer computing power is enhanced,and the theory and specific application of image segmentation methods are also developing continuously.Most of the existing methods can meet the accuracy requirements in real-time semantic segmentation,but they can not be effectively applied in some scenes with high requirements for real-time,such as virtual reality,modern agriculture,and automatic driving.For the above limitations,we propose two different real-time semantic segmentation networks based on feature fusion,which combine residual networks,attention mechanisms,deeply separable convolutions,and multi-scale feature maps.A series of experiments show that the method based on feature fusion proposed in this paper can meet the accuracy and real-time requirements of real-time semantic segmentation,and can be applied to real-time semantic segmentation task scenarios.The main research contents are as follows:Considering that real-time semantic segmentation task needs to take both semantic information and location information into account,the existing feature fusion methods are analyzed and studied,and a dual channel feature fusion module is proposed,which can guide the fusion between the two types of information.A lightweight attention module is designed,which can extract the feature map of real interest and reduce the amount of calculation.On this basis,a real-time semantic segmentation network based on dual channel feature fusion is proposed by combining the lightweight residual network with high feature extraction efficiency.The network first uses residual network and lightweight attention module to quickly extract the location information and semantic information of the image.Then the dual channel feature fusion module is used to guide the feature map fusion of location information and semantic information.On CamVid,mIoU reached 67.8%,and the running speed reached 52.6 fps.On the data set of Cityscapes,mIoU reached 73.5%,and the running speed reached 31.8 fps.In order to solve the problem that the accuracy of the real-time semantic segmentation task is reduced due to the blurring of the object contour,the existing solutions are analyzed.Combining the characteristics of the adaptive multi-scale feature fusion module,the structure of each branch in the module is optimized.First,the information extracted from the branch is dynamically updated in combination with the attention mechanism,and then the function of each branch is refined.The branch with high resolution is taken as the main branch,Other branches are only used for training as auxiliary branches.For the analysis and optimization of multi feature extraction methods,the depth separable convolution with less computation is introduced,and the parameters in the extraction process are adjusted to adapt to the backbone network.Based on the above modules and residual network Res Net-18,a real-time semantic segmentation network based on multi-scale feature fusion is proposed.The proposed method can achieve an average intersection/merge ratio of 67.9% on CamVid dataset,and the segmentation speed can reach 53.0 fps.On the Cityscapes dataset,the average cross merge ratio is 73.9%,and the segmentation speed can reach 33.3 fps. |