| In recent years,the problem of air pollution has become more and more obvious,which has a significant impact on people.The main cause of haze is suspended particulate matter in the air,which is known as PM 10 and PM2.5.Due to haze seriously affecting the environment and people’s health,it is important to accurately predict the concentration of PM2.5.By setting up the PM2.5 concentration forecasting model with high precision,we can grasp the changing trend of PM2.5 in advance,so we can take some measures to deal with it immediately.In this thesis,the prediction of PM2.5 concentration can be carried out as follows:I.This thesis collects the daily average concentrations of PM2.5 and other air pollutants data and combines them with the meteorological data(air temperature,dew-point temperature,relative humidity,atmospheric pressure,wind speed,precipitation)of the same period as sample data,which are all measured in Beijing during the time from January 2014 to April 2017.This thesis explores the important factors that affect PM2.5 by correlation analysis theory.The conclusion of correlation analysis is helpful to explore the correlation between PM2.5 and various influencing factors,and it is meaningful for the choice of optimal variables in the forecasting model.2.The basic knowledge of support vector regression algorithm(SVR)and random forest algorithm(RF)are introduced in the thesis.According to the conclusion of correlation analysis,the optimal variables for the forecasting model are selected.The optimal parameters of the model are found through parameter optimization method and the models of PM2.5 concentration are established which based on support vector regression and random forest respectively.We select the data from January 2014 to March 2017 as the training dataset and the rest data from April 1 to April 30 as the validation dataset to test the generalization ability of model.Finally,the result shows that both models can agree well with the actual data and have a good generalization ability to predict PM2.5.However,the difference between the two models is the error between the predicted value and actual data.The SVR model’s evaluation index RMSE=11.3098,while the Random Forest’s RMSE=14.8519,which can obviously find that the SVR model has a good generalization ability to predict PM2.5.According to the generalization ability and stability of the model,we choose SVR model as the prediction model of PM2.5. |