Font Size: a A A

Application Of Random Forest Model In Fine Particle Concentration Prediction In Taiyuan

Posted on:2018-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Q YangFull Text:PDF
GTID:2321330536466082Subject:Statistics
Abstract/Summary:PDF Full Text Request
Primary pollutant is the most polluted species in the air,whose concentration is an important air quality index.With air quality getting worse,it is very important to take timely measures such as forecasting and early warning.As a typical energy and chemical base,Taiyuan suffers from air pollution for a long time.Air quality problem in the city has attracted many attentions from both the government and the citizens,so it is urgent to carry out further studies on air quality issues in Taiyuan.First,based on previous studies,this study analyzes the data of Taiyuan for daily air pollutant concentration and surface meteorological condition during December 1,2013 and December 31,2016.The feature of air quality and distribution days for different primary pollutant in the three years shows that the major primary pollutant in Taiyuan is particle(PM10 and PM2.5).PM10 is the major primary pollutant in spring and summer seasons,while PM2.5 is the major primary pollutant in autumn and winter seasons.During 2014-2016,the number of days with moderately polluted and worse index is 138,among which 17 days in spring and summer and 121 days in autumn and winter.On the basis of the fact that the air quality in Taiyuan is worst in autumn and winter,this study performs the prediction work for PM2.5 only in these two seasons.Next,the relevant theoretical knowledge of random forest model used in this study is systematically elaborated.Then,based on previous studies and from a perspective of air pollutant and meteorological parameters,this study collect the key factors that influencing PM2.5 concentration,and analyse the Pearson correlation coefficient and Spearman rank correlation coefficient between PM2.5 concentration and these factors.Finally,10-fold cross validation method is used to establish PM2.5 concentration forecasting model based on random forest algorithm,and the result is compared with those of traditional linear regression model,Boosting regression model and support vector regression model.The results shows that the size ranking of predicating performance indexes NMSE,MAE and RMSE in the test set is that: linear regression model>Boosting regression model>support vector regression model>random forest regression model.Meanwhile,the size ranking of predicating performance index R in the test set is that: random forest regression model>support vector regression model>Boosting regression model>linear regression model.In summary,compared with the other three models,the random forest model has the advantages of higher prediction precision,stronger ability of generalization and no need of feature selection.Therefore,this method is worth to apply and popularize in the prediction of particulate concentration in urban areas.
Keywords/Search Tags:primary pollutant, Taiyuan city, cross validation, random forest regression model
PDF Full Text Request
Related items