Font Size: a A A

Remote Sensing Scene Classification Of High Spatial Resolution Images Based On Deep Feature Expression

Posted on:2023-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:K J XuFull Text:PDF
GTID:1522306821973669Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
For massive remote sensing images,it is of great social and commercial value to complete the information value-added from data to knowledge through efficient and convenient ways of intelligent interpretation.As a crucial component of remote sensing image interpretation,scene classification aims to accurately interpret semantic categories by modeling ground objects and spatial patterns,and it can fully reflect the high-level semantic information of images.Because of the more intuitive description of spatial elements,structures and patterns,scene classification has become a crucial research topic in the remote sensing community.High-resolution remote sensing images possess the characteristics of ground objects diversity,complex background interference,changeable ground objects of the same type,rich spatial patterns,and high homogeneity between ground objects,resulting in the challenges of high inter-class similarity and intra-class difference in scenes.Therefore,it is necessary to extract more discriminative scene-level semantic features to improve the classification accuracy.Compared with traditional hand-crafted features that rely on domain knowledge,deep learning possesses the ability to extract abstract semantic features,and it can be employed to represent high-level semantics of scenes.However,existing deep learning methods for remote sensing scene classification meet key problems that expression of deep features is not comprehensive enough.Specifically,it is insufficient to explore multi-level features,key region attributes,context relations,and spectral information.Considering the above facts,this thesis focuses on scene deep semantic feature expression based on deep learning and transfer learning theories,and scene classification methods are developed from four levels,including deep coding feature learning,globallocal structure exploration,contextual semantic understanding,and spatial-spectral joint representation.The main contents and contributions of this thesis are as follows:(1)Due to the potential information in the pre-trained convolutional neural network(CNN)has not been fully explored,a hierarchical features fusion of convolutional neural network(HFFCNN)is proposed to realize the effective mining of deep multivariate information.The proposed framework contains two parallel modules to deal with convolutional and fully connected features.The first module is an adaptive spatial-wise attention-based multi-scale nonlinear bag-of-visual-words(ASA-MNBo VW).When ASA-MNBo VW is employed to encode multi-layer convolutional feature maps,it not only adequately considers the multivariate nonlinear relationship between local features,but also explores key spatial region response and multi-scale information.The second module adopts a weighted image pyramid(WIP)structure to reveal the multi-scale global geometric information of images by aggregating fully connected features of local image patches.The fusion and complementation of the above-mentioned hierarchical features can effectively reveal the rich information of the pre-trained network.(2)For the methods of employing pre-trained CNN model as feature extractors,the difference between remote sensing images and natural images is not considered,and it leads to the problem of limited discriminative ability of the model.A two-stream feature aggregation deep neural network(TFADNN)is constructed and fine-tuned to achieve a more complete deep feature expression.The method can be integrated with existing convolutional neural networks,and it mainly consists of two parallel parts,including the stream of discriminative features and the stream of general features.In the first stream,global average pooling is introduced to replace the fully connected layer of the pre-trained network,and it can effectively learn the discriminative features of scenes at arbitrary scale.In the stream of general features,a multiscale nonlinear encoding-based bag-of-visualwords(MNBo VW)is proposed to process convolutional feature maps,and multi-scale coding feature representation with more generalization can be obtained.Finally,twostream features are combined to achieve multivariate deep feature extraction of remote sensing scenes.(3)Aiming at limited classification accuracy of convolutional neural network related methods caused by the complex and changeable scale,texture and shape of objects in intricate scenes and background interference occlusion,a global-local dual-branch structure model is designed to fully explore the global-local structure(GLDBS)of remote sensing scenes.The GLDBS method extracts the convolutional energy map based on the Res Net18-Res Net34 dual-branch model,and it is converted into a binary image to obtain the coordinates of the largest connected area,which is the crucial local area of scenes.Then,a global-local two-stream structure is designed to separately learn discriminative features from original images and local key image patches,and a joint loss is proposed to effectively coordinate the two streams,which features separability is improved.The two features focus on information from different perspectives,the discriminative ability of the global feature stream can be effectively improved by fusing the decision results of the local feature stream,especially for some easily confusing semantic classes.(4)To relieve the problem that CNN-based methods ignore the contextual relationship of ground objects,resulting in insufficient description of the geometric structure and spatial layout of scenes,scene classification methods based on graph convolutional network(GCN)are proposed to explore contextual semantic understanding.Specifically,a deep feature aggregation framework driven by graph convolutional network(DFAGCN)is designed.The proposed DFAGCN employs the pre-trained CNN to obtain multi-layer features,and a GCN model is introduced to reveal correlation of local features in convolutional feature maps for obtaining more fine-grained contextual features.By introducing three weighting coefficients,multi-layer convolutional features and fully connected features can be effectively integrated.Due to the DFAGCN method is not an end-to-end network,which leads to limited application capabilities,a CNN-GCN joint network(CGJNet)is developed for remote sensing scene classification.The CGJNet method includes two modules,including CNN stream and GCN stream.The CNN stream extracts global features of remote sensing images based on the pre-trained Dense Net-121.The GCN stream adopts convolutional feature maps obtained by the pre-trained VGGNet-16 to construct adjacency matrices,and a GCN model is constructed to capture contextual features of scenes.As a result,competitive classification results are achieved by integrating global and contextual features.(5)Considering the fact that high spatial resolution images are mostly three-channel or single-channel,the discrimination of scenes with similar visual perception is not enough,while existing hyperspectral image data sets are difficult to be directly used for scene classification,a hyperspectral remote sensing data set for scene classification(HSRS-SC)is developed for exploring spatial-spectral joint representations.The data set is divided into five typical categories and contains a total of 1385 scenes,and each image possesses a spatial resolution of 1 meter and 48 spectral bands to provide rich spectral and spatial information.To evaluate the impact of spatial-spectral joint representation on scene classification performance,based on Alex Net,VGGNet-16,Goog Le Net,and Res Net,three sets of experiments are designed,including band selection,full spectrum information and spatial-spectral attention mechanism.The experimental results show that the introduction of spectral features improves discrimination ability of scenes,and the constructed data set can provide a good data foundation for the study of hyperspectral scene semantic understanding.
Keywords/Search Tags:Remote sensing, High resolution images, Scene classification, Deep feature expression, Feature fusion
PDF Full Text Request
Related items