Rocky desertification is an extreme form of land degradation in karst areas,known as the " cancer " of the earth,which seriously threatens ecological security,affects agricultural production,and even endangers human survival.Rocky desertification control has become one of the important problems to be solved in the construction of ecological civilization.Accurate investigation of rocky desertification is an important prerequisite for rocky desertification control,and the bare rock rate is an important indicator for evaluating the grade of rocky desertification.How to accurately extract bare rock information is very important.At present,the extraction of bare rock information mainly includes remote sensing and ground survey.Remote sensing technology can obtain a wide range of surface information,but the effect of extracting fine bare rock information is not good,which requires the cooperation of highresolution images and ground survey methods.Some studies have used high-resolution UAV images to extract bare rock information,but mainly based on traditional image processing methods,the extraction accuracy is still limited;ground survey can obtain more accurate information,but there are problems such as time-consuming,high cost,and difficulty in mountain survey.In view of the above problems,this paper uses RGB three-channel high-resolution aerial photography images in karst rocky desertification areas to create bare rock data sets.Based on the theory and algorithm of deep learning,different semantic segmentation models are constructed to extract bare rock information in multiple scenes.Comparative experiments and analysis are conducted on the self built dataset with commonly seen semantic segmentation models,verifying the superiority of the model constructed in this article,Achieved highprecision extraction of bare rock information in karst rocky desertification areas.The main tasks are as follows:(1)Aiming at the problem of missing bare rock extraction datasets,a dataset with bare rock targets in karst rocky desertification areas as semantic labels was constructed.Firstly,highresolution images of RGB three bands were used to create labels through manual sketching.Secondly,the data was augmented to obtain 2000 labeled images,which were applied to subsequent research.(2)Aiming at the problem that traditional extraction methods cannot fully extract image deep semantic features and have poor effect on bare rock extraction,deep learning method is adopted to extract bare rock and a CAFM3-Deep Lab V3+ model is constructed..The model is optimized on the basis of the Deep Lab V3 + model.The atrous spatial pyramid pooling module introduced by Deep Lab V3 + can effectively obtain target information at different scales.The CAFM3-Deep Lab V3 + model uses the improved lightweight Mobile Net V3 as the feature extraction network,and combines the feature pyramid and attention mechanism to enhance the feature extraction effect.Experimental results show that the CAFM3-Deeplab V3+ model has an intersection over union of 72.46% and F1-score of 84.03%,which are 4.62% and3.19%higher than the original model,respectively.Furthermore,the CAFM3-Deep Lab V3-model outperforms other commonly used models while having only 1/13 of theparameters of the original Deep Lab V3+ model.(3)Aiming at the problems that the improved CAFM3-Deep Lab V3 + model has weak recognition ability for objects with similar color of bare rock,missing extraction in complex areas and low extraction accuracy,a CRCU-Net model is constructed.The model is optimized based on the U-Net model.The CRCU-Net model uses Res Net101 with deeper network level as the feature extraction network,which makes up for the defect that the original U-Net network is not accurate enough for the expression of complex background image features.And obtain the relationship and spatial location information between feature map channels through attention mechanism.Finally,the content-aware reassembly of features upsampling operator is used to recover the features in the decoder,which effectively utilizes the semantic information of the feature map.The experimental results show that the CRCU-Net model improves the target recognition ability with high similarity to the bare rock target,the false extraction phenomenon is greatly improved,and the segmentation accuracy is further improved.The intersection ratio and F1-score of the model reached 74.11 % and 85.13 %,respectively.It is3.64 % and 2.45 % higher than the original U-Net model,and 1.65 % and 1.1 % higher than the CAFM3-Deep Lab V3 + model.(4)Aiming at the problem that the convolution-based semantic segmentation model has weak ability to obtain long-range information and low segmentation accuracy under complex conditions,the Vi T-CRCU-Net model is constructed.The model combines the advantages of convolutional neural network and Vision Transformer.Vi T-CRCU-Net uses the CRCU-Net,based on convolution,as the main body of the model foracquiring local image details and highlevel semantic information.The Vision Transformer module is added at the end of it to capture global features and long-range information.In addition,the pre-trained weights of CRCU-Net were used in the model training and the model was trained using a freezing training strategy.The experimental results show that this training strategy improves the efficiency and accuracy of model training.Vi T-CRCU-Net achieves the best segmentation performance in complex backgrounds and scenes with strong interference targets,enabling more accurate segmentation of bare rocks.The model’s intersection over union and F1-score improve by 1.04% and 0.68%respectively,compared to the CRCU-Net model. |