Font Size: a A A

Design And Implement Of Big Data Platform Of Atmosphere Pollution

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhaoFull Text:PDF
GTID:2381330602989833Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,atmospheric environmental quality problem occurs frequently in China.Atmospheric pollutants mainly in representative of PM2.5,which cause tremendous harm to human health and atmospheric environment.Atmospheric pollution problem involves various aspects,and there are many influencing factors.For getting accurate conclusion,it is necessary to analyze and process various data such as ground monitoring data to PM2.5,meteorological data,satellite remote sensing data and pollution source etc.So,it becomes the key topic to analyze and dispose of various data in atmospheric pollution monitoring.Big data technology is applied as a new thought to solve the problem mentioned above.Using big data technology to solve data integration,storage and information mining of atmospheric pollution,becomes a research hot spot in the field of atmospheric pollution.Although China has completed national ground monitoring network for PM2.5,because of uneven distribution of ground monitoring station,collecting and processing data of high density and wide coverage from atmospheric pollution monitoring station,becomes another research hot spot in atmospheric pollution field.To solve problem of the data collection,storage,analysis and application to atmospheric pollution,big data platform has been designed for atmospheric pollution based on Hadoop and Spark.And directing against the problem of uneven data spatial distribution from ground station,applied research on PM2.5 has been realized in China region by bringing in AOD(Aerosol Optical Depth)data.The main contents are given as follows:(1)The design and implementation of atmospheric pollution platform.First of all,aiming at the distributive collection framework,distributive file system,distributive parallel calculating framework and other related techniques,the researches must be launched.Then,demanding analysis to big data platform of atmospheric pollution needs to be conducted,and big data platform framework needs to be designed which can satisfy different resource collection,various atmospheric data storage and data analysis in different applied scenarios.A big data platform of atmospheric pollution is built by integrating Hadoop and Spark.The data distributive collection can be finished from multi-source and front-end server by Flume and Kafka.By combining Spark Streaming and Kafka,The data streaming into Kafka buffer in real time,can be processed and calculated in real time.Through HDFS and HBase it can complete distributed storage and has redesigned Row Key for the characteristics of atmospheric pollution data and optimized the storage of the atmospheric pollution data.The integration of each large data component helps complete the construction of distributive cluster environment.Atmospheric pollution big data platform has been designed and realized with four-layer structure,including data collection,preprocessing module,data storage module,data analyzing module and data visualization module.(2)The research and design on estimation and prediction algorithm of PM2.5.Conducting comparative research on integrative learning framework between Bagging and Boosting,basing on referred in context random forest,GBRT and XGBoost algorithm,combination with atmospheric pollution data characteristics and using the feature and advantage of each algorithm,the multi-model fusion algorithm is designed by mixing the three algorithms to optimize the model.Multi-model fusion algorithm can be put into the big data analysis engine of atmospheric pollution.Simulation testing result shows that the performance of multi-model fusion algorithm in each aspect is superior to random forest,GBRT and XGBoost,which further promote model accuracy and testify the data analysis effectiveness to atmospheric pollution in this algorithm.(3)Applying the estimation and prediction algorithm to the data analysis in big data platform of atmospheric pollution,the ground PM2.5 concentration estimation and PM2.5 concentration can be conducted hour-to-hour forecast and the various functions of big data platform to atmospheric pollution is tested.The algorithm accuracy has been testified through the time-space analysis to the prediction result.Through the data testing to the ground PM2.5 in 2016,among 1497 stations by China National Environmental Monitoring Center,proves the platform function of all aspects can satisfy the PM2.5 hourly prediction and estimation and helps realize the real-time monitoring to the concentration variation of PM2.5.The platform functional module can work smoothly-effectively and stably by practice,which can satisfy the needs of data integration,data storage and data analysis of atmospheric pollution.The platform provides the scientific decision basis for effective precautions against atmospheric pollution.
Keywords/Search Tags:big data, hadoop, spark, ensemble learning, PM2.5, atmosphere pollution
PDF Full Text Request
Related items