| With the rapid development of remote sensing technology,the quality and updating speed of remote sensing data have been significantly improved.Multi-source remote sensing data are widely used in agriculture,forestry,Marine,environmental protection and other fields.Remote sensing image classification has always been a very active research subject in the application of remote sensing technology.It refers to the use of remote sensing data to make land use or land cover map.At present,the application based on artificial intelligence model and algorithm has become very common.Machine learning and deep learning are the methods to realize artificial intelligence.With the continuous innovation of deep learning,the field of computer vision has developed rapidly in the past few years and made breakthroughs constantly.The driving factors for the development of computer vision are the innovation of algorithms,the increase in the amount of visual data and the improvement of computing power.In image classification,target detection and positioning,image segmentation and other tasks,deep learning algorithms have surpassed the traditional statistical methods on a large number of benchmarks,and even surpassed human beings in the ability of image and target recognition.In the field of agriculture,the use of remote sensing data for crop classification is an important research content.Using the spatial-temporal scale advantage of remote sensing image to obtain the spatial distribution and planting area of crops in a timely and accurate manner is of great significance for ensuring food security and realizing sustainable agricultural development.High resolution remote sensing image has the characteristics of high background complexity,rich detail information and diversified spatial structure,so the classification accuracy is often low when the traditional machine learning classification algorithm is applied to the classification of high resolution remote sensing image.In recent years,many researchers have tried to construct semantic segmentation network through deep learning algorithm and applied it in pixel level ground object fine classification.Remote sensing image classification based on artificial neural network has become a development trend.This paper mainly focuses on the classification application of semantic segmentation algorithm based on artificial neural network in high-resolution remote sensing images.The main research contents are as follows:(1)Aiming at the training difficulties and network degradation caused by depth enhancement in deep convolutional neural networks,this paper proposes a deep convolutional neural network CSNet,which uses residual network as an encoder.Compared with existing models,CSNET has the characteristics of deeper depth,simple structure and easy training.Firstly,the backbone network is designed with a 50-layer residual structure,and the input image features are fully extracted.Then,skip connections are added between the encoder and the decoder for feature fusion.The output feature map contains not only the low-level semantic features of high resolution,but also the abstract features of high dimensional space.Meanwhile,in order to compare the effect of fine tuning on the performance of the model,two schemes are used to adjust the internal structure of the residual block.In the Gaofen No.1 remote sensing image crop classification experiment,the Overall Accuracy(OA)of the modified CSNet(Res Net C)was improved by 13.3%and 9.5%compared with the baseline model(random forest and support vector machine).(2)Focusing on the key problem that multi-scale neural networks can learn multiple features under different sensitivity fields,thus improving the classification accuracy and fine-grained image classification,this paper proposes a multi-scale feature fusion network MSSNet which adopts parallel multi-branch structure design.Firstly,the network uses multi-branch asymmetric convolution and void convolution.Each branch concatenates conventional convolution with convolution nuclei of different sizes with void convolution with different expansion coefficients.Then,the features extracted from each branch are concatenated to achieve multi-scale feature fusion.Finally,a jump join is used to combine low-level features from the shallow network with abstract features from the deep network to further enrich the semantic information.In crop classification experiments using Sentinel-2 remote sensing images,it was found that compared with FCN32S,FCN8S and UNet,the classification accuracy of MSSNet was 7.81%,3.59%and 2.83%higher,respectively,and the output crop classification map was more effective in land segmentation and edge characterization of ground objects.(3)By improving the Seg Net network,a lightweight semantic segmentation network A2Seg Net based on spatial-spectral attention is proposed to solve the problem of information overload in deep convolutional neural networks.Attention in deep learning mimics the working principle of human visual nervous system.In complex scenes,attention is naturally focused on important information based on prior knowledge,while secondary information is ignored.Attention mechanism convolutional neural networks take into account both efficiency and precision,and solve the problem of in-class heterogeneity by inserting attention into multilevel receptive field and feature fusion.In the experiment of crop classification based on hyperspectral images,compared with Seg Net network,the lightweight model A2Seg Net proposed in this paper has higher inference efficiency and classification accuracy,and has good stability.(4)In view of the problem that convolution operation is good at extracting local features but has limitations in long-distance dependency modeling ability,this paper proposes a semantic segmentation network DE-UNet with dual encoder,which is designed using Swin Transformer+convolutional neural network architecture.Swin Transformer pays attention to multi-scale global features and learns local features through convolutional neural network.Integrated features take into account both global and local context information.In the experiment,the UAV image with visible band only was used to test the three models including DE-UNet.In the case of not rich features,DE-UNet still achieved high classification accuracy,and the overall accuracy was 0.28%and 4.81%higher than UNet and UNet++,respectively.It shows that the introduction of Transformer enhances the model fitting ability.Based on the above research,CSNet,MSSNet,A2Seg Net and DE-UNet networks are proposed in this paper based on the depth,width,attention mechanism of the deep learning model and Transformer combined with convolutional vision.The model was evaluated in the crop classification experiment of multi-source remote sensing data.The experiment showed that the proposed algorithm has a good application prospect in the task of crop classification,and can be used as a supplementary monitoring tool for crop area extraction and agricultural subsidy distribution. |