Research On PM2.5 Concentration Prediction Under The Background Of Big Data

Posted on:2020-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Wang

Full Text:PDF

GTID:2431330578454515

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

With the rapid development of China’s economy,haze often occurs and PM2.5 is the main pollutant,so environmental protection is urgently needed.China has its own environmental monitoring system,but a large amount of data is not fully used.It is very meaningful to use historical data to predict PM2.5 concentration.This can help people avoid pollution in time and also help the government have enough time to manage.The main research work of this paper is as follows:Chapter 1: Research background,research status and research process.Chapter 2: List of research theory used in this paper,including statistical learning,linear regression,Naive Bayes theory and model of evaluation index,etc.Chapter 3: Data acquisition and data preprocessing.The data was shared by UCI and the time span was from Jan 2,2010 to Dec 31,2014.Including time,temperature,pressure,wind speed and other variables.Clean data,check data consistency,deal with missing values,etc.Data reprocessing makes data better adapt to the model.Chapter 4: Model building.One is a multivariate linear regression model for different seasons,and the other is a Naive Bayes model for predicting severe polluted weather.(1)Improve the traditional multiple linear regression model.The higher the score of the model,the better the performance of the model.The model score of traditional multiple linear regression is 58.732,the score based on thermograph optimization is 65.987,the model score after iterative feature selection is 69.657.Finally,the model discussion in different seasons,the winter model scored 93.985.(2)Naive Bayes classification is used in the study of severe polluted weather.It has been proved by many experiments that the model parameters are the best after removing the time factor.The recall of the optimized model in predicting abnormal weather was 0.79.It shows that nearly 80% of the abnormal weather can be accurately identified,so the model is applicability.Chapter 5: Summary and outlook.Both of the final models are useful,however,the unbalance of the original data set affects the precision of the model in predicting severe polluted weather.The proportion of non-severe pollution weather data is so large that the model has a preference in classification.In order to solve this problem,this paper also discusses the method of model optimization based on non-balance data set,and gives a feasible research idea for future research.

Keywords/Search Tags:

PM2.5, Predict, Linear regression, Naive Bayes, Non-Balance data

PDF Full Text Request

Related items

1	Aluminum Electrolytic Cell Health Evaluation Method And System Based On Combinatorial Weighted Naive Bayes
2	Research And Application Of Extruder Anomaly Detection Based On Time Series Analysis
3	Beijing PM2.5 Prediction Algorithm Based On Machine Learning
4	The Application Of The Data Mining Technology On Import Food Quality Inspection
5	Design Of Water Quality Monitoring System Based On NB-IoT And Research On Data Classification
6	Research And Design Of A Pressure-controlled Drilling Overflow Monitoring And Diagnosis System Based On Cloud Computing
7	The Research Of Gas Balance Forecast In The Energy Center Of Jinan Iron & Steel Group
8	Data Analysis For Air Pollution Incidents
9	Iron And Steel Enterprise Energy Management And Data Correction System Design And Implementation,
10	A Study On The Design Of Garment Parts Based On Multiple Linear Regression