Font Size: a A A

Study On The Regional Debris Flow Susceptibility Evaluation Based On Machine Learning Methods

Posted on:2024-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:R Y GaoFull Text:PDF
GTID:1520307178496924Subject:Civil engineering
Abstract/Summary:PDF Full Text Request
Debris flow susceptibility evaluation is an important part of debris flow prevention and control.Reasonable and reliable evaluation results can provide an important basis for relevant regions to formulate scientific disaster prevention and mitigation programs.Benefiting from the development of remote sensing technology,geographic information systems,global positioning systems,and computers,the evaluation technology for debris flow susceptibility has gradually matured and improved.Machine learning methods have been widely used in recent years due to the advantages such as powerful nonlinear processing capability and robustness.Most of the existing studies are based on a single study area for modeling and analysis,there are still some difficulties that need to be solved in practical applications.For example,a small amount of defective data could severely affect the model performance and underrepresentation of samples may occur when the sample size of debris flows in a single study area is too small.In addition,when researchers face multiple study areas,the model established based on a single study area usually lacks the ability to generalize to the rest of the study areas,and the purpose of debris flow susceptibility evaluation is to provide a unified decision-making basis for land use planning and disaster prevention and mitigation for all the study areas,independent susceptibility evaluation of multiple study areas will also give rise to the problem of inconsistency in the evaluation criteria,which will affect the reasonableness of the final decision-making.In this paper,we firstly aim to improve the model performance,and took Fangshan District of Beijing as an example study area to optimize the negative sample acquisition strategy as well as the machine learning algorithms.Subsequently,the domain adaptation approach was introduced,and the transfer component analysis method was used to extract the common features of Fangshan District and Yanzi River basin in Longnan District for joint evaluation to provide a theoretical approach for multi-regional debris flow susceptibility evaluation.In order to reduce the impact of sample heterogeneity on the model,this paper took Beichuan County in Sichuan Province as an example study area to explore the application of unsupervised clustering algorithms in solving the sample heterogeneity problem.The improvement of model performance,the domain adaptation of different study areas and the solution to the sample heterogeneity problem enabled the machine learning model to be generalized to multiple study areas.Finally,this paper combined the above research results to complete the multi-regional debris flow susceptibility evaluation.The main work of this study are as follows:1.A systematic review was conducted on the development process of debris flow susceptibility evaluation,domain adaptation methods for different study areas,and solutions to sample heterogeneity.The data collection and processing methods,as well as the establishment and evaluation methods of models,were also elaborated in detail during the process of debris flow susceptibility evaluation.2.Various debris flow negative sample acquisition strategies were proposed to improve the reliability of negative samples.Taking Fangshan District as an example study area,the negative debris flow samples were obtained based on the support vector machine(SVM)algorithm,the SPY technique,and the Isolation forest(IF)algorithm under the conditions of single-grid,multi-grid,and watershed unit as the basic sample representation,respectively.And then the negative samples were combined with the corresponding positive debris flow samples to form 9 modeling datasets.The datasets were used for factor analysis and modeling to compare the advantages and shortcomings of different negative sample acquisition strategies.The results showed that the negative sample acquisition strategy based on SVM algorithm is more dependent on the performance of the classifier,and the overall performance is unstable.The negative sample acquisition strategy based on the SPY technique has lower dataset requirements and is not significantly dependent on the algorithm,it can also improve the quality of multiple datasets.The assumptions of the negative sample acquisition strategy based on the IF algorithm fit well with the watershed unit datasets,so this strategy can be used to improve the quality of corresponding datasets.3.Combining nine datasets formed by different sample representations and negative sample acquisition strategies,36 machine learning models were trained and evaluated based on SVM,Random forest(RF),Gradient boosting decision tree(GBDT)and Stacking algorithms.The results showed that the Stacking model has more obvious advantages in prediction accuracy in multiple datasets,but its model complexity is significantly larger than the rest of the models,and the training prediction efficiency is much lower.The RF model,on the other hand,has a more balanced performance in terms of prediction accuracy,model complexity,training and prediction efficiency,so it is a high-quality algorithm that is easy to generalize.4.Introducing the domain adaptation theory and adopting the method of transfer component analysis to extract the common features of different study areas to realize the joint evaluation of different study areas.Taking Fangshan District and Yanzi River Basin as the example study areas,based on the transfer component analysis,the feature matrices of the two study areas were projected to a common feature space.The RF was selected as the modeling algorithm to establish a unified model based on the samples from different study areas,and then the model was compared with the traditional models based on the samples from a single study area.The results showed that although the prediction accuracy of the unified model is not as good as that of the traditional model,the establishment of the unified model alleviated the sample shortage problem of a single study area and improved the modeling efficiency.On the other hand,the sensitivity,specificity,accuracy and AUC of the unified model reached 82.2%,79.6%,80.6% and0.84,respectively,which is still satisfactory.5.A solution to the heterogeneity of debris flow samples based on unsupervised clustering method was proposed.This study took Beichuan County as an example study area,and the fuzzy C-mean clustering algorithm was selected to divide the samples of the study area into four categories.Conditioning factor analysis and personalized modeling were carried out for each type of dataset,and then the results were compared with the conditioning factors and global model obtained based on the total dataset.The results showed that the predictive ability of the same conditioning factors in each type of dataset is quite different,which demonstrated the strong heterogeneity of the debris flow samples in the area.On the other hand,most of the conditioning factors have stronger predictive ability in each type of dataset than in the total dataset,and the overall performance of the final personalized model is also better than that of the global model.Therefore,the fuzzy C-mean clustering algorithm has a good application prospect in solving the sample heterogeneity problem.6.A multi-regional debris flow susceptibility evaluation was carried out.We took the watershed unit as the sample representation to extract the common features of the three study areas based on the transfer component analysis method.Then the samples of the 3 study areas were treated as a whole for clustering based on the fuzzy C-mean clustering method to reduce the effect of sample heterogeneity on the model.The common feature analysis and personalized modeling was carried out based on the samples of each type of dataset.In the process,the negative samples were obtained based on the IF algorithm.And the modeling algorithm was selected as the RF with more balanced performance.The models were finally compared with the traditional global model built based on a single study area.The results showed that the multi-regional personalized model outperformed the traditional global model based on a single study area.The results further validated the rationality and effectiveness of the multi-regional debris flow susceptibility evaluation method proposed in this paper,and could promote the application of machine learning in the field of debris flow susceptibility evaluation.
Keywords/Search Tags:Machine learning, Debris flow, Susceptibility evaluation, Domain adaptation, Heterogeneity
PDF Full Text Request
Related items