| In geomorphology,landslide is defined as the phenomenon and process of sliding of a geotechnical body on a slope along a weak surface under the action of gravity due to certain disaster-induced conditions,which is characterized by sudden occurrence,wide hazard range and easy to cause secondary disasters.The occurrence of landslide can be triggered by a variety of factors,such as short-term heavy rainfall,earthquake,etc.and is highly related to the local geological and geomorphological background,which is a more complex and difficult to overcome disaster problems.Landslide identification and mapping focuses on depicting the location,extent and type of landslides,which is the cornerstone of risk,hazard and vulnerability evaluation in subsequent disaster prevention work.Therefore,it is of great value to study the performance of landslide identification model to create a detailed and complete landslide cataloguing map for disaster prevention and mitigation work.Throughout the decades of development in the field of landslide identification and mapping,landslide identification has progressed from the initial traditional decoding method to the intelligent extraction stage based on machine learning models,which has continuously improved in terms of identification efficiency,accuracy and automaticity.However,when landslide identification is based on intelligent extraction models,researchers often ignore the sample imbalance in landslide datasets,which limits the learning ability of the models and leads to low accuracy.In this study,we address the above problems in three aspects: landslide sample selection,unbalanced data set processing,and optimization strategy,and apply the traditional data balancing algorithm and the sample expansion strategy based on generative adversarial network to the field of landslide identification,in order to explore the learning potential of the model and further improve the model identification efficiency and mapping accuracy.In this study,20 landslide factors are extracted from the low-dimensional unbalanced and high-dimensional unbalanced landslide datasets in the Three Gorges reservoir area and Jiuzhaigou region,coupled with multiple sources such as optical remote sensing images,DEM and geological maps,and the landslide sample data are overlaid to make landslide identification and mapping work,and the main research contents and results are as follows:(1)Landslide detection and mapping based on the traditional data balancing algorithm.Five oversampling algorithms in the traditional data balancing algorithm: ROS,Smote,Borderline-Smote1,Borderline-Smote2,ADASYN;one undersampling algorithm RUS and one hybrid sampling algorithm Smote Tomek are used to balance the landslide dataset,while SVM,Randomforest,GBDT XGBoost,CNN and Dense Net for landslide identification to further explore the learning potential of the model.The experimental results show that the models trained by applying the balanced dataset in the two regions improve 2.74%-15.53% in AUC and 0.12%-10.50% in Kappa coefficient compared with the initial machine learning model.The deep learning model improved AUC by 1.14%-7.25% and Kappa coefficient by 0.15%-5.73%.It shows that the traditional data balancing algorithm is universal for solving the imbalance problem in the landslide dataset.(2)Landslide identification and mapping based on Generative Adversarial Network(GAN)sample expansion strategy.The Smo-SE-WGAN framework is constructed by adding the Smote module,modifying the Loss function to Wassterin distance,and the SE module for the one-dimensional landslide data features to enhance the network characterization ability and generate pseudo-data closer to the real distribution to further explore the learning potential of the model.The experimental results show that the machine learning model applying Smo-SE-WGAN framework in two regions improves the AUC by 0.27%-2.38% and the kappa coefficient by 0.51%-2.17% compared with the traditional data balancing algorithm.The deep learning model improves AUC by 0.12%-1.56% and Kappa coefficient by 0.53%-0.87%.Overall,the optimized Smo-SE-WGAN framework can generate pseudo-data that is closer to the real distribution of landslides than the traditional data balancing algorithm,which can maximize the potential learning ability of the model and improve the landslide identification accuracy and mapping effect,and has strong practical application significance in the field of landslide identification. |