With the rapid development of China’s economy and the continuous improvement of people’s living standards,the problem of the disharmony between environment and development has become increasingly prominent.The environment has been greatly damaged in the process of human society development,mainly in the air,water and soil on which humans,animals and plants depend.In recent years,regional air pollution phenomena occur frequently in China,especially in the Beijing-Tianjin-Hebei region.They not only have negative effects on human normal production,life,work and study.At the same time,they also have harmful effects on people’s physical and mental health.The concentration of the primary pollutant in air is an important air quality index.As a core city in the Beijing-Tianjin-Hebei region,Beijing has received much attention to air quality issues.It is an important work to forecast and warn the primary pollutant in Beijing in time.In this thesis,we mainly analyzes and studies the concentration prediction of primary pollutant in Beijing from the following aspects:Firstly,we collected the daily air quality monitoring data of Beijing from December 2013 to August 2018.And made a descriptive analysis of the overall situation of air quality and the distribution of primary pollutants in Beijing in the past five years.The analysis results show that the primary pollutant in air pollution in Beijing is PM2.5.Secondly,we introduced the principle of the random forest algorithm.Considering the two factors of pollutant factor and meteorological factor,we collected some important related indicators,and analyzed the correlation between PM2.5 concentration and other indicators by using Pearson’s correlation coefficient and Spearman’s rank correlation coefficient.Thirdly,through ten-fold cross-validation method,the prediction model of Beijing’s primary pollutant concentration based on random forest model was established.And compared with decision tree model,boosting regression model,bagging regression model and neural network model.The results show that the values of Normalized Mean Square Error,Root Mean Square Error,Mean Absolute Error in the test data set are sorted by model: decision tree model > Bagging regression model > Boosting regression model > neural network model > Random Forest regression model.Therefore,the following conclusions are drawn: the random forest algorithm has higher prediction performance and stronger generalization ability,and it is worthy of being promoted and applied in the prediction practice of urban pollutant concentration.Finally,we considering the impact and diffusion of inter-city air pollution,this thesis established a Gaussian plume model to study the distribution and evolution of PM2.5 concentrations in Beijing and surrounding cities.It is hoped that this thesis can provide effective suggestions for the relevant cities and departments to carry out intercity air quality prevention work. |