Font Size: a A A

Air Quality Assessment And Classification Based On Statistical Learning Method

Posted on:2020-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:S Z GuoFull Text:PDF
GTID:2417330578473081Subject:Statistical machine learning
Abstract/Summary:PDF Full Text Request
Air quality level is closely related to people's production and life.According to the 2018 Global Environmental Performance Index Report,China's environmental quality ranks 177 in 180 countries and regions.Air pollution is one of the most urgent problems to be solved.In order to effectively carry out the next step of air pollution control,it is necessary to conduct in-depth research on air pollution data to study its development trends and characteristics.Scientifically and reasonably evaluate and classify air quality to provide reasonable and effective advice for improving urban air quality.The sample of this paper are air quality data and weather data from December 1,2013 to December 28,2018 in 31 provincial capital cities.The original dataset is compensated by the missing forest filling method.The statistical learning method is used to study the original data from the time dimension.A variety of machine learning models are established to evaluate and classify the urban air quality,and the model performance is carried out from multiple angles.The main research works of this paper are:(1)Analysis the overall situation of air quality and study primary pollutants.Most cities with high air quality index are found in northern cities,and the seasons fluctuate greatly.After 2016,O3 gradually replaced PM2.5 as the primary pollutant in some cities.(2)Cluster provincial capital cities based on pollutant data and Analysis of typical urban air quality from time dimension.Using the combination of principal component analysis and hierarchical clustering,31 provincial capital cities are divided into three types of cities,and the typical cities Nanning,Beijing,and Zhengzhou are selected.Since 2014,the annual average of the three typical urban air quality indicators has shown a downward trend.The annual air compliance days of the three typical cities reached the highest in 2018.Beijing's monthly average has rebounded from May to July each year.This is because Beijing's O3-average value reached its highest value in May-July,and it was the highest proportion of primary pollutants.There is no significant difference in AQI mean between days in the week,and the weekly variable is not a key factor affecting the air quality index.(3)In order to avoid the one-sidedness of single-index AQI evaluation,this paper considers the factors such as pollutant concentration,weather and cycle,and selects three machine learning methods to classify and predict the air quality level and optimizes the models.From the perspective of algorithm prediction accuracy,the random forest is 4.18% higher than the BP neural network.The GBDT algorithm is 4.55% higher than the BP neural network,and the prediction accuracy reaches 98.89%.From the running time of the model,the random forest model reduced by 61.766 s compared with the neural network,the GBDT model is reduced by 66.964 s compared with the neural network;from the macro full rate,macro precision and macro F1 indicators,the GBDT algorithm has a good performance.GBDT algorithm can be effectively used for air quality classification.
Keywords/Search Tags:Hierarchical clustering analysis, Primary pollutant, BP neural network, Random forest, GBDT algorithm, Performance evaluation
PDF Full Text Request
Related items