| Because of the development of satellite technology,which have been widely used in environmental assessment,precision agriculture,urban planning and other fields.Scene parsing mainly obtains ground object features and detailed information through remote sensing images.Semantic segmentation classifies images pixel-by-pixel,giving each pixel class meaning.The rich information contained in remote sensing images provides data support for applications in many fields.However,in the face of massive remote sensing data,how to analyze remote sensing data quickly,efficiently and automatically has become an urgent problem to be solved.At present,the best semantic segmentation effect is the deep learning method.By automatically learning the features of the image,extracting the abstract data features such as texture and shape of the image.However,high-resolution remote sensing images contain a large amount of detailed information of complex scenes and ground objects,and the size and data volume of each sample are often uneven,which increases the difficulty of remote sensing image segmentation.The generalization ability of general network to effectively segment remote sensing images still needs to be improved.And with the deepening of the network level,the spatial location and scale features of the image will be lost,so the current segmentation method still has space for improvement.Based on the deep learning network model,this thesis deeply studies the semantic segmentation method of remote sensing images,and mainly completes the following tasks:(1)A multi-branch semantic segmentation network SegMPAN(SegNet with MP-ASPP and Attention module Net)based on SegNet is proposed,the network is an encoder-decoder structure,the encoder is the feature extraction stage,and the decoder is the upsampling stage.Two multi parallel ASPP(MP-ASPP)modules are added to the encoder,which fully extracts the multi-scale features of the feature map and enhances the feature extraction ability of the network.Based on the attention mechanism,a lightweight attention module is proposed,including the channel attention module and the spatial attention module,so that the important channels and feature points in the feature map have greater weight.The MP-ASPP module and the attention module form a multi-scale attention module(MAM)in parallel,which can fully extract multi-scale features while enhancing the attention to key feature points.Two MAM branches are added to the encoder,and the feature maps output by multiple branches in the encoder are fused with the corresponding feature maps in the decoder by splicing feature maps.Semantic information enhances the feature recovery ability of the network.(2)Aiming at the problems of wrongly classified points and blurred segmentation boundaries in the segmented images,in order to obtain better segmentation results,fully-connected conditional random fields(DenseCRFs)are used to post-process the images to further solve the problem of blurred object boundaries.Train the SegMPAN,U-net,and SegNet models,use DenseCRFs for image post-processing on the segmentation results obtained by the three models,and then fuse the three models through the voting method.The category with the most votes is determined as the final category of the pixel,which effectively reduces the misclassification points and further improves the segmentation accuracy.(3)In this thesis,experiments are conducted on the ISPRS and the CCF dataset,respectively,and the two datasets are preprocessed and augmented to establish a large-scale remote sensing image dataset that can adapt to more complex scenes.Comparing the method in this thesis with the traditional image segmentation method,the effectiveness of the method in this thesis is verified in terms of experimental data and experimental results.The effects of MP-ASPP module,attention module,MAM,DenseCRFs,and model fusion on the experimental results are verified and analyzed.Finally,the MIoU of our method on the ISPRS dataset is 70.92%,and the F1-score is 82.01%.The MIoU on the CCF dataset is 82.35%and the F1-score is 92.19%. |