| With the advantages of high permeability,low compressibility,high shear strength and low cost,gravel soils are widely used worldwide in dam construction,embankment filling,highways,railway roadbeds,marine reclamation and other fields.Once liquefaction occurs,the economic loss is huge.Therefore,it is important to study the model for determining the liquefaction of gravelly soils.The sample size and data quality of the gravelly soil liquefaction database are essential to improve the predictive performance of the model.However,the number of existing historical liquefaction samples is not sufficient to ensure the predictive accuracy of the gravelly soil liquefaction discriminant model,and the quality of these historical liquefaction data has not been assessed qualitatively and quantitatively.Furthermore,the existing liquefaction discrimination models do not take into account the influence of the weights of the selected factors.In this paper,new gravelly soil liquefaction data are expanded using data expansion methods,and the impact of the quality of gravelly soil liquefaction data on the performance of six commonly used supervised learning models is explored in terms of three dimensions:uncertainty,uniqueness and outliers.The accuracy of the model is further improved,and a new research idea is provided for the seismic mitigation of engineering.The main research of this paper is as follows:(1)The existing database of gravelly soil liquefaction is expanded.Through extensive literature research,on the basis of 234 historical gravelly soil liquefaction data collected and collated,the unused information in the Wenchuan earthquake liquefaction site soil data was extracted,and 122 new liquefaction data were expanded using a Markov chain Monte Carlo algorithm fused with a hybrid Bayesian network method,and the validity of the expanded new liquefaction samples was tested through experiments.The expanded data were added to the original gravelly soil liquefaction database to increase the number of liquefaction samples and improve the data distribution of the liquefaction variables.(2)Three dimensions of uncertainty,uniqueness and outliers were selected to explore the impact of gravelly soil liquefaction data quality on the performance of commonly used supervised learning models.The results show that the presence of anomalous samples in the dataset leads to a reduction in the learning performance and generalisation ability of supervised classification models.In addition,the presence of a large number of duplicate samples in the database improves the learning performance of supervised classification models,but reduces the predictive performance of these models.Finally,in terms of supervised learning model performance,the best performance is achieved when only two types of samples,A(lower uncertainty)and B(moderate uncertainty),are considered in the training set and the ratio of the two is around 1:1;when three types of samples,A,B and C(higher uncertainty),are present in the training set,for the six common supervised classification models,it is recommended that the ratio of A samples should be 10%-20%,B samples(3)proposes a hierarchical plus-ranking model for the six common supervised classification models,with the proportion of class A samples at 70%-80%,and the proportion of class C samples at 5%-10%,which can lead to a better prediction performance.(3)A hierarchical weighted Bayesian network method was proposed and a hierarchical weighted Bayesian network discriminant model for liquefaction of gravelly soils was constructed based on this method.The model breaks through the limitation that the weights of liquefaction influencing factors cannot be considered in existing Bayesian network models used for liquefaction prediction,and greatly simplifies the complexity of previous Bayesian network liquefaction discrimination models by referring to the idea of liquefaction discrimination norms while ensuring the high accuracy of the discrimination results.A simple package program for the liquefaction discrimination model of hierarchical weighted Bayesian networks is also developed using the Visual Basic for Applications language on board the Excel software.By comparing the model with three existing gravelly soil liquefaction discriminatory models and four other discriminatory models based on machine learning methods,it is found that the hierarchical weighted BN model constructed in this paper has better prediction performance and can give the probability of liquefaction occurrence.Finally,the applicability of the model constructed in this paper is tested using the latest publicly available data on liquefaction of gravelly soils from the Wenchuan earthquake,and the results show that the model constructed in this paper has reliable generalisation performance. |