Single Station PM2.5 Concentration Prediction And Its Spatiotemporal Influencing Factors Based On Machine Learning | | Posted on:2022-03-03 | Degree:Master | Type:Thesis | | Country:China | Candidate:H Y Yao | Full Text:PDF | | GTID:2491306479980709 | Subject:Cartography and Geographic Information System | | Abstract/Summary: | PDF Full Text Request | | With the rapid development of industrialization and urbanization in China,the impact of air pollution has become increasingly serious.As one of the main air pollutants,PM2.5 has been widely concerned because of its great threat to human health,and has become an important indicator of air quality early warning.Timely and accurate prediction of PM2.5 concentration can provide guidance for people’s outdoor work and activities,and is of great significance to reduce the health risks of PM2.5.It is a hot issue in the PM2.5 research field at present.With the development of computing science and the accumulation of a large amount of measured data,PM2.5 concentration prediction based on statistical models has gradually emerged.Among them,machine learning method with better prediction effects is one of the main direction for the development of PM2.5 concentration prediction.However,when using machine learning method to predict the PM2.5 concentration of a single station,there is still a lack of sufficient research on how to select the key factors as input parameters in a targeted manner,and how much space should be considered for the key factors with spatial characteristics.The research on this issue will help simplify the machine learning model and improve the efficiency of modeling and operation.It will help to reveal the main factors that affect PM2.5 concentration changes and their interactions through machine learning,and formulate effective prevention and control measures according to local conditions.It is also helpful to provide reference for the reasonable station selection of PM2.5 monitoring stations.In this study,Shanghai Shiwuchang Air Quality Monitoring Station(National Control Station Number:1142A)was taken as an example,and a series of random forest models were constructed with different spatio-temporal change factors in different spatial ranges to predict the station’s PM2.5concentration in the next 1~24 hours,combined with the analysis of common accuracy statistical indicators such as RMSE and other business indicators such as false alarm rate and missing alarm rate of pollution incident,as well as the mechanism analysis of the impact of meteorological elements in the atmospheric transportation of particulate matter,the method for determining of the scope of spatial information in the model and the influence of each factor on the prediction of different durations are explored,so as to explore a modeling method that can combine machine learning with the local characteristics and has a high reliability of PM2.5 concentration short-term prediction.The main research contents and results of this paper are as follows:(1)Analyze the influence of the historical PM2.5 concentration,time and meteorological elements on the PM2.5 concentration prediction by adjusting the input factors,without considering the surrounding conditions.The results show that the PM2.5 concentration of the prediction station in the first one and two hours has an important impact on short-term prediction.Adding time elements(the month,week,and hour at the time of prediction)has little improvement in prediction accuracy.Adding the nearest station meteorological elements has a significant positive impact on the prediction of the next 8~24 hours,but the impact on the false alarm rate and missing alarm rate of pollution incidents is relatively limited,even negative.(2)This paper proposes a method to select key surrounding stations based on time-lag cross-correlation analysis,and the representative surrounding air quality stations related to the predicted station under different prediction hours are selected using this method.The results show that the selected surrounding stations are distributed from near and far to the northwest of the predicted station as the prediction time increases,which coincides with the PM2.5 outside source location of the prediction station and its trajectory along with the wind,reflecting that the method conforms to the physical mechanism of particle transport is generalizable.Adding the PM2.5 concentration of surrounding air quality stations to the model can significantly improve the forecast accuracy and false alarm rate.On this basis,selecting meteorological stations in the buffer surrounding each selected air quality station and adding them to the prediction model can achieve almost the same prediction accuracy as using all meteorological stations.It can be seen that this method can select the most representative surrounding stations that conform to the physical mechanism of particulate matter transmission,thereby reducing the number of input noise features of the machine learning model effectively.(3)A random forest model for predicting the PM2.5 concentration considering a variety of factors is established.The results show that the multi-factor prediction model that considers the historical concentration in the first 1 and 2 hours,the PM2.5 concentration of the surrounding stations and the surrounding meteorological elements has the best prediction accuracy and the lowest false alarm rate and missing alarm rate on the whole.In the 24th hour prediction,compared with the control model group that only considers the historical concentration,MAE decreased from 21.2μg/m3 to 16.6μg/m3,and RMSE decreased from 28.3μg/m3 to 22.5μg/m3 The false alarm rate and missing alarm rate have been reduced by 24%and 21%.Compared with the forecast accuracy index that only considers the full time series and contains a large number of good weather conditions,the index of the air pollution incidents’false alarm rate and missing alarm rate is more pertinent and practical. | | Keywords/Search Tags: | PM2.5, Time Lags Cross Correlation, Factor analysis, Prediction, Machine learning, Random forest, false alarm rate, missing alarm rate | PDF Full Text Request | Related items |
| |
|