Font Size: a A A

Research On Estimation And Concentration Prediction Model Of Pollutant Missing Abnormal Data In The Field Of Atmospheric Environment

Posted on:2022-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2491306488460174Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The progress of science and technology promote the development of society.The rapid growth of economy is accompanied by a large number of resource consumption and the generation of harmful gases,which leads to the frequent occurrence of environmental pollution and air quality deterioration,endangering human health and the ecological environment.The analysis and prediction of atmospheric concentration change trend have always been a research hotspot.The important premise of the analysis and prediction is to have accurate,perfect and scientific basic data resources.However,in the process of basic data collection,transmission and storage,a large range of missing and abnormal data may occur due to sensor equipment failure and network disconnection.The traditional method of completing missing abnormal data has a lot of uncertainties and large errors,so the analysis and prediction results obtained by using it will have a high cumulative deviation.Therefore,in the field of atmospheric environment,this paper estimates and complements the missing abnormal data accurately,which provides scientific and accurate basic data for comprehensively exploring the pollution status of air quality,finding out the main reasons for the deterioration of air quality from the source of pollution,and realizing the real-time monitoring and prediction of pollutants.The specific research contents are as follows:(1)Construction of estimation model for missing abnormal data of air quality pollutants.In this paper,the Yunnan-Guizhou Plateau region was selected as the study area,and the air quality pollutant data and meteorological data from 2017 to 2018 were taken as the basic data to calculate the pollutant indicators with different degrees of missing anomalies in pollutant data.The statistical results show that there are a lot of missing and abnormal data in ground monitoring stations in many cities,which is PM10.According to the nonlinear characteristics of air quality pollutant data,an estimation model(w-SVR-GS)for estimating a large number of missing abnormal data is constructed.At the same time,the weight thought was added to the multiple linear regression MLR model,and three model evaluation criteria(R2,RMSE,and MAPE)were used to compare and analyze the performance of the two algorithms and the traditional algorithm.The estimation results show that the weight extractioncan greatly improve the estimation performance of the model,and the three evaluation criteria are R2=0.97,RMSE=11.7,and MPAE=2.35%,respectively.Compared with the traditional model,the prediction accuracy of this model is improved by 21%.The addition of weights also improved the estimation ability of the traditional linear regression model by 21%and decreased MAPE by 4.7%.(2)Analysis of primary pollutants based on spatiotemporal characteristics.This paper comprehensively explored the air quality pollution situation in the region from 2017 to 2018.According to relevant regulations on atmospheric environment,the types and occurrence times of primary pollutants were excavated,and statistical analysis and discussion were conducted on all pollutants according to temporal and spatial characteristics,so as to provide prediction targets for the prediction model research in the following paper.The calculation results show that there are three primary air quality pollutants(PM10,PM2.5and O3)in this area,and the occurrence times of these pollutants are generally decreasing.The occurrence times of O3are all higher than the first two pollutants,reaching more than 300 times.According to the statistical analysis of the daily and quarterly average concentrations of pollutants,it was found that PM2.5exceeded the level II concentration limit in China and was almost close to the level III concentration limit.The concentration of PM10 and PM2.5 reached the peak in winter,and the emission was the largest.The variation pattern of O3 and PM2.5 was consistent between the late spring and the early summer,but showed a relative change trend in other periods.The variation rule of O3and PM2.5is highly consistent between late spring and early summer,which has a special mechanism.From the perspective of spatial analysis,it is found that O3and PM2.5have relative changes.The distribution of PM10is basically the same as that of PM2.5,and the distribution characteristics of SO2and NO2are also similar.Based on the spatial distribution characteristics of several pollutants,PM10and PM2.5concentrations in this region are mainly concentrated in several cities of Guangxi Province and the northern part of Guizhou Province,showing an overall trend of high in the east and low in the west.The air quality in Yunnan Province is relatively good,with the exception of higher O3concentration and lower concentrations of other pollutants than that in Guizhou and Guangxi.(3)Establishment of primary pollutant concentration prediction model based on long and short-term memory neural network(LSTM).In order to strengthen the prediction of air quality pollutant concentration change rule,the neural network of long and short memory in deep learning is used to build the prediction model of primary pollutant concentration.The data is divided into data with different proportions,and the proportion of suitable training and prediction data is selected according to different prediction targets.On this basis,the influence of pre-sequence data of prediction targets on the prediction results is explored.Meanwhile,the prediction results of different periods are analyzed.Results show that under the same neural network structure,the division of different proportion of data have a certain influence on the prediction precision of the model to predict the goal itself,the former sequence data is an important constraint condition prediction model,when join the predicted target data for training,the preamble of the model more effectively capture the indicators on the time series characteristics,so as to get more accurate results.Compared with the input of single non-target sequence data,the input of multiple non-target sequence data can improve the accuracy of prediction.
Keywords/Search Tags:Missing data estimation, Primary pollutants, Time series prediction, Support vector machine regression(SVR), Long and short-term memory neural network(LSTM)
PDF Full Text Request
Related items