| In recent years,ground images obtained from drone platforms have gradually become the mainstream data source for precision agriculture due to their high spatial resolution and rich geometric textures.However,there are many "interferences" in crop classification.The ultra-high resolution results in a decrease in inter class differences and an increase in intra class differences,increasing the difficulty of distinguishing crops with similar textures,resulting in fragmented segmentation patterns and uneven edge segmentation.Although convolutional neural networks have made continuous progress in crop classification research,they have certain limitations in modeling long-term dependencies and capturing global features of crops.To address this,this paper proposes the MSAT(Multi-Scale Attention Network)model and the Deep Trans(Deep Labv3+ with Transformer)model to tackle the issues of unsmooth segmentation edges and fragmented prediction results.Both models have significantly reduced crop misclassification.The MSAT model has fewer parameters and is lighter,but the Deep Trans model has better classification performance compared to the MSAT model.The specific research content is as follows:(1)To address the inadequate edge information extraction and misclassification of crops with similar textures,which leads to suboptimal classification results,we propose the Multi-Scale Attention Fusion Model(MSAT)by improving the Deep Labv3+ model.The MSAT model builds multi-scale convolutional attention modules to effectively extract edge information and improve crop classification accuracy.The multi-scale convolutional attention module obtains crop information at different scales within the same level by embedding multi-scale blocks.Multi-scale feature maps are mapped into multiple sequences,which are independently fed into the convolutional attention module.The convolutional attention mechanism enhances the focus on the contextual information of crops,improving the model’s extraction of edge information.Finally,local and global features are fused to achieve both coarse and fine information extraction.In addition,the backbone network of the model is replaced with the Efficient Net-B0 network,making it lighter than Deep Labv3+.The evaluation results on five crops,rice,sugarcane,corn,banana and citrus,show that the MSAT model has higher classification accuracy,which verifies that the fine crop classification method based on high-resolution images is feasible and the equipment cost is low.(2)To address the issue of fragmented patches in segmentation results,this paper introduces the Transformer into the Deep Labv3+ model and proposes a parallel branch structure for UAV image crop classification called the Deep Trans model.Deep Trans combines the Transformer and CNN in parallel,facilitating the effective capture of both global and local features.By introducing the Transformer to enhance long-range dependencies in image information,the model improves the extraction capability of global crop features.Channel attention mechanisms and spatial attention mechanisms are added to enhance the Transformer’s sensitivity to channel information and the Atrous Spatial Pyramid Pooling(ASPP)module’s focus on target crop information.Experiments show that the Deep Trans model has higher evaluation accuracy,and the classification accuracy of sugarcane,banana and corn crops that are easy to be misclassified is also greatly improved.It can be seen that the Deep Trans model has a better segmentation effect in the internal filling and global prediction of crop classification images. |