| With the popularity of transportation,how to improve the quality of pavement has become a hot topic of research.The variation of various factors causes the pavement information collection system to be prone to failure which has caused the appearance of abnormal data.Therefore,the detection and repair of abnormal data are very meaningful.With the development of data mining technology,machine learning has made significant breakthroughs in the detection and repair of abnormal pavement data.This paper proposes a method based on machine learning for the identification and repair of abnormal states in asphalt pavement perception data,and conducts an in-depth study on it.Firstly,a one-dimensional perception data set of asphalt pavement temperature and humidity is constructed,and statistical-based algorithms:3σcriterion,box plot and t-test criterion are used to detect and analyse the outliers of the one-dimensional temperature and humidity data respectively;after that,two-dimensional feature vectors are constructed based on the one-dimensional data,and first-order difference and linear summation feature vectors are constructed respectively.The outlier detection was carried out using the Local Outlier Factor(LOF)algorithm,K-Means and Oneclass Support Vector Machine(Oneclass-SVM),and the model was evaluated and analysed using the Davies-Bouldin’s Fortin Index(DBI),an outlier detection evaluation index.The number of neighbourhood points(K)of the LOF model is optimised using DBI to obtain the best K value to achieve the best detection effect.Next,the asphalt pavement temperature and humidity anomaly repair sample database was constructed,and the CART,Random Forest,XGBoost,LSTM and Light GBM algorithms were used to predict the anomaly repair database,and the grid search method was used to optimise the parameters of the above machine learning models.The model predictions can be evaluated by evaluating the metrics.The experimental results show that the GS-XGBoost model has the highest correlation coefficient R~2 for predicting temperature and humidity,0.968 and 0.917respectively,and the Mean Absolute Error and Root Mean Square Error are also better than the other models.Finally,the limitations of the grid search method of finding the optimum are considered,which allows for the optimisation of the XGBoost model parameters using a genetic algorithm(GA),which determines the best model parameters through selection,crossover and variation.The comparison revealed that the GA-XGBoost model reduced the temperature and humidity Root Mean Square Error by 1.16%and 6.21%and improved the speed by 5.13 and 5.7 times compared to the GS-XGBoost model.The stability analysis of the data before and after the restoration was also carried out according to the smoothness index,and the experimental results showed that the mean and median of the restored data became higher,the standard deviation and coefficient of variation became smaller,and the overall was more stable.In summary,the DS-LOF model is able to perform outlier identification on time series was proposed,which can achieve higher accuracy.One-dimensional asphalt pavement temperature and humidity anomaly data detection is utilized;meanwhile,the GA-XGBoost model proposed in this paper is able to predict and repair the abnormal data,which improves the stability of asphalt pavement temperature and humidity data and has important reference value for the mining of pavement data information in China. |