| In the process of landslide susceptibility prediction(LSP)modeling,landslide inventory is indispensable.However,for a large study area,it is difficult to obtain a complete landslide inventory.To address this problem,this paper focuses on the influence of incomplete landslide inventory on LSP modeling,and the transfer rules of error generated by this influence in the model.The study also attempts to reduce this influence by identifying a certain number of potential inventory landslide samples to expand the landslide inventory sample size.Taking Xunwu County,China,as an example,the existing landslide inventory in Xunwu County is firstly obtained and assumed to contain all landslide inventory samples under ideal conditions,and then different landslide inventory sample missing conditions are simulated by random sampling.It includes the condition that the landslide inventory samples in the whole area of Xunwu County are missing randomly at the proportions of 10%,20%,30%,40% and 50%,and the condition that the landslide inventory samples in the south of Xunwu County are missing in aggregation.Then,five machine learning models,namely,Random Forest(RF),Back Propagation Neural Network(BPNN),Logistic Regression(LR),Long Short-Term Memory(LSTM),and Support Vector Machine(SVM)are used to perform LSP modeling.Finally,the Receiver Operating Characteristic(ROC)accuracy,mean and standard deviation,spatial difference methods are adopted to evaluate the LSP results,to analyze the uncertainties of LSP modeling under various conditions.In addition,this paper introduces a variety of machine learning interpretability methods,such as the Permutation factor importance,Partial dependence plot,H-statistic,SHapley additive explanations,etc.,to explore the changes of the decision basis of RF model,so as to summarize the transfer rules of inventory error in the model.Finally,this paper attempts to use SBAS-InSAR to obtain the surface deformation rate and cumulative deformation in some areas of Xunwu County,and combines the LSP results with high-resolution remote sensing images to interpret potential landslides.To explore the feasibility of using potential landslides as an inventory expansion sample in LSP when the landslide inventory is incomplete.The main results are as follows:(1)A certain proportion of landslide inventory samples(10%~50%)that are randomly missing will affect the LSP results of local area,this effect is weakened radially around the missing samples.Additionally,although the ROC accuracy shows that the prediction results in the entire area are still reliable,it is worth noting that sometimes the landslide susceptibility map obtained based on an incomplete landslide inventory may appear overly optimistic in ROC accuracy.(2)Aggregation of missing landslide inventory samples may cause significant biases in LSP,particularly in areas where samples are missing.From the perspective of ROC accuracy,when the landslide inventory samples in the south of Xunwu County are missing in aggregation,the LSP accuracy of the entire region decreases significantly(with the RF model accuracy decreasing by 6.1%,BPNN model by 6%,LSTM model by 4%,SVM model by 3.5%,and LR model by 1.5%).It can be realized that the spatial uniformity of the landslide inventory sample distributionis a crucial factor in determining the accuracy of the LSP results when the number of landslide inventory samples is fixed.(3)Complex models may be more sensitive to missing landslide inventory samples.When the landslide inventory samples in the south of Xunwu County are missing in aggregation,the RF model experiences the most significant decrease in LSP accuracy among all models(a 6.1% reduction in the overall ROC accuracy and a 12%reduction in the ROC accuracy for the southern Xunwu County).When 50% of the landslide inventory samples are randomly missing,neural network-based algorithms are most affected.In contrast,although the Logistic Regression(LR)model has a lower ROC accuracy,its simple structure provides it with higher "stability." When landslide inventory samples are missing in aggregation,the overall ROC accuracy decreases by only 1.5%.When 50% of the landslide inventory samples are randomly missing,almost no grid units have their susceptibility values overestimated or underestimated.(4)When 50% of landslide samples are missing(either randomly or aggregated),the changes of the decision basis of the RF model are mainly manifested in: 1.the importance ranking of environment factors is slightly different;2.the interaction strength between most environmental factors is weakened.3.for LSP on the same test grid unit,the weights of individual factors in the model may vary drastically.In addition,the marginal benefit deviation of the elevation factor may be one of the important reasons affecting the model’s decision-making criteria when the landslide inventory samples are missing in aggregation.(5)Based on the surface deformation information and LSP results,this study visually interpreted 32 potential landslides.After adding these 32 potential landslides to the condition that the landslide inventory samples in the south of Xunwu County are missing in aggregation as new samples and rebuilding the RF model,the ROC accuracy of the entire area by the RF model increased by 2%,and the prediction accuracy in the southern part of Xunwu County(where landslide samples are missing in aggregation)increased by 4%.This indicates that adding potential landslides that are interpreted using the SBAS-InSAR and LSP results as new samples to the landslide inventory can improve LSP accuracy under conditions of landslide inventory samples are missing in aggregation.However,in terms of the increase in accuracy,the significance of this method needs to be strengthened. |