Font Size: a A A

Design And Implementation Of China Air Quality Statistical Analysis System Based On Hive Data Warehouse

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2381330623969896Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the Sustainable development of economy and information technology,Chinese air pollution monitoring network and meteorological monitoring network have improved constantly.Large amounts of air pollution data and meteorological data have been accumulated.Under these circumstances,building a scientific,reasonable,timely,effective data analysis system can make full use of air pollution data,it can also help making Chinese air pollution control policy from providing information analysis reference.In this paper,the simulation data is the air pollution data and meteorological data in China from 2014 to 2019,we used Hadoop,Hive,Spark,Pyecharts as technical frameworks to design and implement Chinese air quality statisti.al analysis system which has data collection,storage,analysis,and visualization capabilities.In addition to basic storage requirements,Chinese air pollution data warehouse is also designed with tiered storage and partitioned storage functions.Tiered storage is dividing the origin data is into several tables store hierarchically in the data warehouse according to different data usage requirements,the tiered storage can improve the use efficiency of air pollution data.Partitioned storage refers to the formation of partitioned information according to the observation and collection time of data to partition the data.After functional testing,the partitioned storage strategy can effectively improve the retrieval efficiency of large-scale data.The statistical analysis of Chinese air pollution include spatial distribution analysis,time series analysis,and meteorological factor influence analysis,Time scale includes year,season,month,day.According different time scales,it use the AQI data and meteorological data across the country and the three major economic regions of the Beijing-Tianjin-Hebei region,the Yangtze River Delta,and the Pearl River Delta to make Spatiotemporal evolution analysis and forecast.The influence analysis of meteorological factors on air quality involve single factor research and multiple factors research,the effects of several meteorological indicators such as atmospheric temperature,surface temperature,atmospheric pressure,precipitation,humidity,sunshine,and wind speed on air pollution are discussed,The research conclusions are as follows.First,The annual distribution of air pollution shows that the AQI in most parts of China has gradually improved from 2014 to 2018,then the cities and the sites with annual average air quality that are polluting levels have been decreasing year by year.Second,the time series trend analysis conclusions show that Beijing-Tianjin-Hebei region's air pollution level is the heaviest,and the biggest improvement in Chinese three major economic regions,then the Yangtze River Delta follow by it,the Pearl River Delta's air pollution level is so slightest that its improvement is least.In the forecast section.The accuracy of the monthly prediction of the SARIMA model and the daily prediction of the RNN-LSTM model can reach 85.49% and 99.6%,respectively,which can accurately predict the future AQI levels of various regions.Third,The analysis of the influence of meteorological factors on trends shows that the influence of a single meteorological factor on AQI varies significantly in different regions,however,the influence of the whole meteorological factors on AQI in each region has been increasing year by year,indicating that the improvement in air quality was caused by emission controls.
Keywords/Search Tags:Data Warehouse, Hadoop, Chinese Air Quality, Statistical Analysis
PDF Full Text Request
Related items