| In recent years,with the rapid development of high resolution earth observation technology,the data volume of high resolution remote sensing images has been rapidly increased,and the spatial resolution of remote sensing images has been rapidly improved from the meter level to the sub-meter level.The high resolution remote sensing images are characterized with clear geometrical structure and rich spatial details,which provides important interpretation basis for high resolution remote sensing images.At present,the understanding level of high resolution remote sensing images basically realizes the transformation from pixel level understanding(e.g.metal materials,grassland,etc.)to the object level understanding(e.g.airplane,vehicle,etc.),which greatly improves the interpretation accuracy and hierarchy.However,the object level understanding methods only can reach the object bounding box level,and they cannot directly span the "semantic gap"to understand the high-level semantic information of complex scenes containing multiple ground objects(e.g.airports,residential areas,commercial areas,etc.).Traditional pixel-level,object-level,and scene-level understanding all rely on handcrafted feature design operators to extract features,where the extracted features have poor universality and the extraction process has limited automation.Based on the micro-to-macro ground object construction hierarchy from high resolution remote sensing images,how to span the"semantic gap" to automatically and integrally achieve the "pixel-object-scene" level deep understanding of high resolution remote sensing images is an urgent and significant topic to be further studied.In order to realize the automatic understanding of "pixel-object-scene" level deep understanding of high resolution remote sensing images and span the "semantic gap"between the low-level features and the high-level semantics,researches on pixel-level,object-level,scene-level understanding have been conducted separately.However,the existing pixel-level,object-level,scene-level understanding methods are commonly based on the handcrafted features and shallow-level classifiers.Thus,the following problems still exist in understanding high resolution remote sensing images:(1)The utilization of pixel-level information is insufficient.Most of the traditional pixel-level classification methods use discrete spectral or spatial information,rely on artificial experience to extract features,lack effective spatial-spectral feature representation.(2)The spatial distribution of the object is difficult to take into account.The traditional object detection method cannot take into account the spatial distribution of high-resolution remote sensing images,such as the neighboring relationship and the scale variances,which can easily cause the miss detection problems and reduce the object detection performance.(3)The high-level scene semantic extraction ability is limited.In traditional scene recognition methods,the feature extraction step relying on human experience limits the accurate and automatic scene semantic extraction.To overcome the problems existing in pixel-level,object-level,scene-level understanding of high resolution remote sensing images,this thesis studied the "pixel-object-scene" level deep understanding of high resolution remote sensing images.The main contents of this thesis are as follows:(1)This thesis systematically summarizes the relevant theories and methods for the pixel-level,object-level,and scene-level understanding of high resolution remote sensing images at first.This thesis analyzes the data characteristics of high resolution remote sensing images and the difficulties in the pixel-level,object-level,and scene-level understanding process,and introduces the current status of pixel-level,object-level,and scene-level understanding of high resolution remote sensing images.(2)At the pixel-level understanding,an unsupervised feature learning algorithm for spatial-spectral pixel-level classification was proposed for high resolution remote sensing images.In order to solve the problems of difficulty in acquiring annotated samples from high resolution remote sensing images,insufficient utilization of spatial and spectral information,and limited ability to automatic feature extraction,this thesis proposed an unsupervised convolutional sparse auto-encoder classifier for high resolution image spatial-spectral pixel-level classification on the basis of sparse auto-encoder feature extraction,convolutional and pooling feature representation with the proposed window-in-window spatial-spectral information joint representation model.(3)At the object-level understanding,the object-level understanding methods solving the problems of miss detection caused by neighboring objects and multi-scale objects are respectively proposed based on Faster R-CNN for high resolution remote sensing images.Because of the limited annotated sample in certain object detection task for high spatial resolution remote sensing images and the miss detections caused by the neighboring objects unsolved by the traditional object detection algorithms,a generalized bounding box conservation object detection method(Faster R-G-CNN)has been proposed on the basis of Faster R-CNN.Faster R-G-CNN considers the neighboring bounding box alleviating problem at the object detection post-processing stage with the generalized non-maximum suppression algorithm.Faster R-G-CNN adopts the transfer learning to improve the object detection efficiency.Due to the miss detection caused by multi-scale objects existing in high spatial resolution remote sensing imagery,a multi-scale augmentation guided Faster R-CNN object detection framework(Faster MSA-R-CNN)has been proposed.Faster MSA-R-CNN deals with the multi-scale phenomenon both at the data preprocessing level and within the Faster R-CNN object detection framework.In addition,to improve the object detection performance of the multi-scale objects from high resolution remote sensing images,transfer learning is proposed to accelerate the object detection efficiency.(4)At the scene-level understanding,to deal with the non-annotated samples and annotated samples,scene understanding methods based on deep learning are respectively proposed for high resolution remote sensing images.Due to the lack of non-annotated samples from high resolution remote sensing images and the non-automatic feature extraction ability of middle-level feature coding scene understanding methods,the first proposed unsupervised hierarchical convolutional sparse auto-encoder scene understanding method(HCSAE)extracts the high-level scene semantics by hierarchically extending the single-level convolutional sparse auto-encoder scene understanding method(single-level CSAE)via the hierarchical idea in deep learning research areas.Single-level CSAE is composed of the sparse auto-encoder feature extraction and the convolutional and pooling feature representation.As a result of the multi-scale objects distributed in the high resolution remote sensing scene images decisively influencing the scene semantic recognition and the limited annotated samples for certain high resolution remote sensing image scene understanding task,the convolutional neural network AlexNet with consideration of spatial pyramid pooling and supervised information(AlexNet-SPP-SS)is proposed to promote the scene recognition accuracy.In addition,the transfer learning and pre-training strategy is adopted(Pre-trained-AlexNet-SPP-SS)to accelerate the accuracy and efficiency of scene understanding.(5)A "pixel-object-scene" level deep understanding framework based on deep learning theory is built for high resolution remote sensing images.Combining the high resolution remote sensing image understanding methods based on deep learning theory with consideration of different scale ground objects,a prototype system for "pixel-object-scene"level deep understanding of high resolution remote sensing images was constructed,and the prototype system is applied with object detection experiments for large-scale high resolution remote sensing image. |