| With rapid development in economy and urbanization,air pollution has becoming increasingly severe,especially fine particles like PM2.5(particulate matter with aerodynamic equivalent diameter not exceeding 2.5 microns).Due to negative impacts on regional ecology and ambient environment,PM2.5 has raised great concerns across the globe over the past decades.Given adverse impacts on public health,monitoring PM2.5 concentration,more importantly,forecasting regional PM2.5 concentration in the next few days is of great significance and practical value for the prevention and control of haze pollution and the reduction of population exposure risk.There exist two typical approaches for PM2.5 concentration forecasting,including numerical forecasting models and statistical prediction models.With physical and chemical principles,the numerical models enable to predict PM2.5 concentration across the globe.Nonetheless,the simulated data quality is largely limited by emission inventories and always suffers from low accuracy.In contrast,with a variety of statistical methods,versatile PM2.5 forecasting models have been established,and these models often exhibit low dependence on input data and have overall high prediction accuracy.Due to the lack of gap-free PM2.5 concentration gridded data,most of the previous studies selected PM2.5 concentration measured from ground air quality monitors as the major input data source,and the predictions thereby fail to depict regional variations in PM2.5concentration over space.To generate a gap-free PM2.5 concentration forecasting dataset with high-accuracy,this study attempted to fuse multi-modal PM2.5 forecasts derived from multiple data sources,aiming at improving both forecasting accuracy and spatial coverage simultaneously.The performance of the proposed methods was evaluated with practical applications to predict PM2.5 concentration in the next 72 hours over North China.Major findings and results of this study are summarized as follows:(1)Firstly,the data accuracy of PM2.5 concentration forecasts provided by the European Copernicus Atmospheric Monitoring Service(CAMS)was evaluated,and a machine learning-based bias correction model was also developed to reduce its bias level.Compared with PM2.5 concentration data measured from ground air quality monitors in China and the United States of America(U.S.A)during 2017–2018,the bias level of CAMS PM2.5 concentration forecasts was evaluated.The results show that the raw CAMS PM2.5 concentration forecasts have a low overall accuracy,with root-mean-square error(RMSE)ranging 38.26–83.24(8.30 to 16.76)μg/m~3 in China(U.S.A).By making use of the random forest method,a statistical bias correction model was established to mitigate bias level in CAMS PM2.5 forecasts,with data inputs of historical PM2.5 concentration measurements and meteorological factors as well as other auxiliary data.The ground validation results indicate that the bias correction model can substantially reduce the bias level in raw CAMS PM2.5 concentration forecasts,with RMSE reduced to 20.49–27.21(5.39-7.65)μg/m~3 in China(U.S.A)after correction.(2)By integrating graph neural network(GCN)and long short-term memory network(LSTM),a site-based PM2.5 concentration forecast model was established to predict PM2.5concentration in the next 72 hours based on historical PM2.5 concentration data measured at ground air quality monitoring stations.Ground validation results indicated that the proposed model enabled to effectively capture the complex PM2.5 variations in space and time over Beijing-Tianjin-Hebei region,with RMSE for 72-hour forecasts ranging 13.32–27.26μg/m~3.The inter-comparison studies indicate that this bias level is lower than that of the model simply working with accounting for temporal variations.This largely benefits from the consideration of PM2.5 variations over space and the attention mechanism.(3)A grid-based PM2.5 concentration forecasting model was then established by replacing GCN in the site-based model with convolutional neural networks(CNN),aiming to address the problem of sparse distribution of ground-based air quality monitors and the lack of gridded PM2.5concentration forecasts.Hourly gap-free PM2.5 grids data generated on the basis of aerosol optical depth derived from Himawari-8 satellite during the daytime were used as the PM2.5 input while CNN was applied to extract large scale PM2.5 spatial variation features.The validation results show that this grid-based model can well predict short-term PM2.5 variations in both space and time,with RMSE of 72-hour PM2.5 concentration forecasts over Beijing-Tianjin-Hebei region ranging 17.08–28.15μg/m~3.More importantly,the predicted PM2.5 forecasts well resembled the distribution of PM2.5 concentration measured at ground monitors.(4)Finally,by taking advantage of the optimal interpolation method,a multi-modal PM2.5forecasts fusion model was proposed to seamlessly blend the bias corrected CAMS PM2.5concentration forecasts with site-and grid-based PM2.5 concentration forecasts,and the ultimate goal was to generate a gap-free high-resolution PM2.5 ensemble forecasts.The results show that the ensemble forecasts show better performance than each single forecast in terms of spatial coverage and data accuracy,with RMSE of 15.45–25.96μg/m~3.Overall,the deep learning-based PM2.5 forecasting models developed in this study provide good illustrations to help other practitioners train their own air quality forecasting models in the future.Compared with PM2.5 forecasts derived from single data source,the ensemble forecasts exhibit promising accuracy and better performance,and the dataset could be used as an important early warning indicator to support regional haze pollution prevention and control. |