| In recent years,with the continuous development of remote sensing satellites and imaging technology,remote sensing data are becoming more and more abundant.Therefore,it is an urgent need to automatically parse effective information from remote sensing images containing rich ground objects.Semantic segmentation of remote sensing images is a key research direction in the field of remote sensing parsing,which is widely used in urban planning,disaster assessment and change detection.Semantic segmentation is the process of dividing each pixel of an image into a certain object category to achieve pixel-level analysis of image content.In order to improve the segmentation accuracy,the existing image semantic segmentation technology usually increases the data set and deepens the number of network layers to form a complex convolutional neural network for feature extraction.However,the cost of this approach is the skyrocketing demand for video memory during training,and at the same time,more computation will be added during the reasoning test.Real-time semantic segmentation uses a lightweight encoder to speed up the model training speed,but the features extracted by the encoder are not rich enough,resulting in low classification accuracy for some small objects or fuzzy objects.In addition,the down-sampling operation adopted to reduce the amount of computation will lose pixel information,thus affecting the final segmentation accuracy,while direct up-sampling cannot completely restore pixels,resulting in blurred segmentation boundaries and other problems.Therefore,in the context of remote sensing image parsing,this thesis focuses on the trade-off between segmentation accuracy and speed in real-time semantic segmentation tasks,and applies real-time semantic segmentation theory to the task of change detection in multi-temporal remote sensing images.The main research work of this thesis is as follows:(1)The trade-off between feature representation capability and spatial positioning accuracy is crucial to dense classification or semantic segmentation of remote sensing images.In order to better balance the low-level spatial details in the shallow network and the high-level abstract semantics in the deep network,a lightweight network is constructed based on bilateral attention refinement,which can use the fine-grained features in the shallow layer to further improve and capture deeper information of high-level semantic features.The network adopts an asymmetric encoder-decoder structure to complete the real-time semantic segmentation task.A lightweight network residual unit with bottleneck structure is designed in the encoder part to achieve lighter,more efficient and more powerful feature extraction.In the decoder part,a local attention enhancement module is proposed to enhance the feature representation adaptively.In addition,in order to better integrate high and low dimensional features,a global context embedding module is proposed.This module divides high-level features into two branches: one branch gets the weight vector to guide the low-level learning;and the other branch will get a semantic vector,which is used to calculate the multi-lable category loss and further introduce into the overall loss function,so as to better regulate the training process.The effectiveness and efficiency of the proposed method are verified on Potsdom dataset and CCF dataset,respectively.The experimental results show that the models using these strategies outperform the baseline network on MIo U,PA and F1,which increase by 18.86%,16.21%,and 15.64% on the Potsdom dataset;10.51%,6.53% and 8.19%on CCF dataset.(2)In order to verify the feasibility of applying the real-time semantic segmentation framework based on deep learning to the change detection of multi-temporal remote sensing images,the change detection in multi-temporal remote sensing images is transformed into a binary classification segmentation problem of changed and non-changed,and an end-to-end convolutional neural network is used to train a large amount of data to automatically extract the change class feature information and generate a change map.In this thesis,a dual branch siamese network is used to jointly extract the features of a given image pair,and the adaptive channel fusion module is used to explore the fusion of channel pairs at multiple feature levels.In addition,it is observed from the dataset that the spatial distribution of the change regions is diverse and the size of the change regions is different.A local-scale feature integration module is further proposed to detect the change of different regions.Finally,experiments are performed on CDD dataset and DSIFN dataset.The results show that the proposed framework achieves superior performance,with significant improvements in precision P,recall R and F1 score. |