Font Size: a A A

Analysis And Prediction Of Air Quality In Wuhan Based On Ensemble Learning

Posted on:2023-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:T YinFull Text:PDF
GTID:2531307163995999Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of our country’s economy and the acceleration of urbanization,environmental problems such as air pollution have become increasingly prominent.Air pollution affects people’s health,production and life,thus it is of practical significance to construct some air pollution analysis and prediction models for our country’s air pollution prevention and control.Based on the daily data related to air quality in Wuhan,this paper applies four ensemble learning methods to study the prediction effect of air quality index(AQI).Firstly,this paper intuitively describes the overall situation and time distribution characteristics of AQI and air pollutant concentrations in Wuhan through visualization;based on the calculation method of IAQI,the primary pollutants in Wuhan are analyzed,the main air pollutants in Wuhan are determined.The Pearson correlation coefficient matrix and heat map of AQI and each feature are established,and the meteorological analysis is carried out,and the key influencing factors of AQI are initially explored;23feature variables are finally determined as input variables for model training through Ridge regression.Secondly,the model performance evaluation index is selected as the evaluation criterion for the establishment,comparison and improvement of the prediction models.Random forest,deep forest,gradient boosting tree and XGBoost are constructed to predict AQI.Meanwhile,whale optimization and bayesian optimization algorithms are used to optimize hyperparameters in models to improve the performance of these models,and the results of the four optimized models are compared.The comparison results between four optimized models show that the prediction accuracy of gradient boosted tree is the best,followed by deep forest,but the training time of the two models is longer;the training speed of random forest and XGBoost is generally faster before and after optimization,but the accuracy of the model is lower.Finally,according to the different performances of the four prediction models both in prediction accuracy and efficiency,the four models based on whale optimization and bayesian optimization are used as the primary learner,and the linear regression model is used as the secondary learner,and the stacking model fusion is performed on the four models.By combining experiments with different primary learners,the results show that the combination of random forest and gradient boosting tree has the best prediction results,and the accuracy of the combined model is further improved.Therefore,using this model can accurately predict the daily data of AQI,and provide a scientific support for air quality prediction and air pollution early warning.
Keywords/Search Tags:Air quality index prediction, Ensemble learning, Whale optimization, Bayesian optimization, Stacking model fusion
PDF Full Text Request
Related items