The rapid development of modern industry and the acceleration of urbanization have brought increasing pressure to the environment,air pollution is particularly serious among all the environmental problems.At present,many regions have established relatively complete air quality monitoring systems,which have accumulated massive air quality data while providing the latest air quality information for the public.These historical data can not only provide information for urban air pollution control,but also an important material for air quality prediction.However,the data missing problem caused by various reasons is common in these monitoring data,which reduces the application value of data.It is important to fill in the missing data.In this paper,a two-stage air quality data filling method is proposed,which consists of two parts: the initialization of missing values and the update of filled values.In the first stage,this paper considers the spatiotemporal characteristics of air quality data,interleaved using the time-and space-based KNN method to fill in the missing values,and continuously increases the number of neighbors to deal with the block missing problem.In the second stage of updating the filling value,since the initialized data does not contain missing values,a more accurate machine learning model can be introduced to improve the accuracy of the filled value.At this stage,this paper uses the random forest model and the Bi LSTM model to deal with the spatial and temporal correlation of the data.The random forest model based on the ensemble algorithm can capture the connection between monitoring stations,while the bidirectional structure of the Bi LSTM model can make full use of the information before and after the missing values,and the filling accuracy is higher than that of the unidirectional structure.Finally,this paper uses the combined weighting method based on least squares to synthesize the filling results of the two models,which achieves the purpose of fusion of temporal and spatial perspectives.The experimental results show that compared with eight benchmark models,the two-stage filling method proposed in this paper has better performance in four different conditions: time block missing,space block missing,ordinary missing and overall missing.In the problem of missing space blocks,the proposed method has significant advantages,compared with the best benchmark model,the MAE error of filling is reduced by about 30%. |