| In the remote sensing area,the very high-resolution(VHR)remote sensing aerial images have been widely used due to their satisfying imaging quality and low acquisition costs.Among a large number of objects in VHR remote sensing images,the building is the most important one,which is widely applied in urban planning,population estimation,disaster protection and many other tasks.With the development of computer science,building extraction task is transiting from manual labelling to computer automatic extraction.However,the existing automatic building extraction methods cannot satisfy the requirements of practical applications.How to effectively extract buildings from high-resolution remote sensing images automatically and accurately is a difficult and hot issue remote sensing community.Therefore,the research of this dissertation has important academic value and engineering significance.The task of extracting buildings from the VHR remote sensing images is a special image segmentation task,which is growing with the development of image segmentation technologies.Recently,with the foundation of high-performance computer hardware,deep learning gains incredible improvements in many areas,especially on the audio and video processing.In the image processing area,Convolutional Neural Network(CNN)is the most widely used deep learning model because of the character of hierarchical feature extraction.However,due to the restriction of the deep learning method and the difference between the remote sensing image and the natural image,there are three questions which cause the performance of deep learning method is unsatisfied on building extraction from the VHR remote sensing images.First of all,because of the differences in the shooting angles and heights,the same building has different shapes and orientations and has large distortions on different remote sensing images.At the same time,the buildings in different regions and terrains have various appearances,which causes the large inner variances of buildings.Secondly,the boundaries of the building are regular and sharp,and the background of buildings in the VHR remote sensing image is complex.However,the mainstream deep learning model cannot accurately extract the boundary area of buildings.Finally,for the restriction of deep learning technology,the neuron cannot extract the local feature and global feature simultaneously,which affects the building extraction performance and robustness of deep learning methods.Therefore,extracting buildings from the VHR remote sensing image is still challenging.To overcome these questions,this dissertation conducts researches corresponding to the network architecture,loss function and neuron,respectively,and proposes three novel building extraction methods which are listed as below:(1)Aiming at the problems of distortion,occlusion,and significant inner variance of buildings in high-resolution remote sensing images,we proposed a new nested network architecture: WebNet,which has better feature extraction ability.Meanwhile,a new lossless hierarchical sampling(LHS)method is proposed to reduce the information loss in the frequently sampling operations of nested architecture.WebNet can efficiently transfer both the high-level and the low-level features within the network,which significantly improves the model’s convergence speed and robustness and greatly enhancing the visual and quantity results of building extraction.Moreover,the parameter amount of WebNet can be flexibly adjusted towards tasks with different precision requirements.(2)Aiming at the unsatisfied building boundary extraction accuracy of the method proposed in(1),we proposed a boundary-aware perceptual loss(BP loss)for building extraction task from the VHR remote sensing images.Compared with pixel-wise loss functions,the BP loss can learn the structural information of building boundary areas and embed them into building extraction networks,which makes the extracted building has more regular and sharp boundaries.(3)Aiming at the problem of the pool feature extraction ability of the basic neuron,which is still existed in method(1)and(2),we proposed a dense hierarchical spatial gaussian pooling(Dense-HSGP)for building extraction task.The Dense-HSGP is based on a new convolution,named Gaussian convolution.The local and global information extraction abilities of gaussian convolution can be flexibly adjusted with different Gaussian kernels.The Dense-HSGP is stacked with numbers of Gaussian convolution with different Gaussian kernels,which significantly enhances the abundance of the model’s receptive field and naturally reinforces the performance and robustness of the building extraction model.The methods proposed in this dissertation have been tested and evaluated on the open-source Inria Aerial Image Labeling Dataset and WHU Aerial Building Dataset.The experimental results indicate that the proposed methods have achieved significant improvements in official evaluation metrics.Among them,WebNet combined with LHS has achieved the state-of-the-art(SOTA)results on both of these two datasets.At the same time,training two commonly used building extraction models with BP loss can get boundaries of the extracted building to be more regular and sharper.Also,embedding the Dense-HSGP into building extraction models achieves incredible performance improvements in every area of Inria dataset.Finally,we fuse the three methods and achieve better building extraction performance,which strongly proves that the effectiveness,convenience and superiority of the proposed methods and its beneficial for the application of extracting buildings for the VHR aerial images. |