Research On Air Quality Time Series Data Processing And Clustering Analysis

Posted on:2024-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Zhu

Full Text:PDF

GTID:2530306929973689

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the rise of artificial intelligence and the advent of the era of big data,complex and various data has been produced.Mining potential patterns and information in time series data is a hot topic in current research.As an unsupervised data mining technique,clustering analysis can identify the structural features from time series data,group similar time series data into the same cluster,and assign dissimilar time series data into different clusters.Clustering analysis of air quality time series data can not only predict air quality changes in the future period of time,but also find the source of pollution and provide policy makers with useful decisions.However,there are usually missing values in the real-time air quality data collected from the monitoring stations,which will affect the accuracy of air quality time series data mining,including clustering analysis.Therefore,to cluster time series data more accurately,this paper firstly proposes a feature-driven time series clustering algorithm,called k Feat TS,based on graphs constructed by mutual k nearest neighbors.Secondly,a first five and last three logistic regression imputation method,called FTLRI,is proposed to effectively deal with missing values in time series data.Finally,the proposed time series clustering algorithm k Feat TS is applied to the clustering analysis of air quality time series data.These two methods,FTLRI and k Feat TS,have been respectively proved to be effective in missing value imputation and clustering analysis of air quality data.The main works of this paper are summarized as follows:（1）Because common time series clustering methods measure the similarity of time series fragments or fixed features,they cannot process feature-rich time series data.This paper proposes a feature-driven time series clustering algorithm,called k Feat TS,based on graphs constructed by mutual k nearest neighbors.This method extracts the main features of the time series data,plots graphs based on these main features,uses the mixed matrix to integrate the graphs based on the main features,and finally performs clustering.Through the experiment on9 different datasets in UCR database with 5 common time series clustering algorithms in recent years,it is proved that k Feat TS can achieve more accurate clustering analysis results on different time series datasets of various sizes,various lengths and various categories,and has certain robustness.（2）Since missing values in air quality datasets will affect the accuracy of clustering analysis,common missing value imputation methods cannot deal with the correlation of time series data on the time axis,and data with high missing rate cannot be accurately filled,this paper proposes a first five and last three logistic regression imputation method,called FTLRI.Combined with the sliding window model,a first five and last three model is proposed,which fully considers the correlation of data on the time axis and the correlation between attributes.In addition,FTLRI combines these two correlations and uses logistic regression algorithm to train a high-accuracy imputer suitable for missing values to fill in missing values.Before the experiment starts,the rows with missing values in the datasets need to be deleted to let the datasets become complete data,and the datasets are processed into data with missing rates of5%,10%,20%and 40%according to each size of the datasets and a certain step size.In this paper,FTLRI is compared with five common missing value imputation methods and a recent neural network imputation method on the processed datasets.It is proved that FTLRI has superior imputation performance compared with other methods.（3）The proposed time series clustering algorithm k Feat TS is applied to the air quality data monitored from Lanyuan Hotel in Lanzhou in 2022 for clustering analysis.The time series data of six pollutants(PM_2.5,PM₁₀,SO₂,NO₂,O₃ and CO)are analyzed separately.The experiment shows that k Feat TS divides the corresponding time series of pollutants into different clusters to obtain clusters of different pollution levels under different pollutants.This experiment demonstrates the usefulness and correctness of k Feat TS in the field of air quality time series data analysis.

Keywords/Search Tags:

Time Series Data, Cluster Analysis, Missing Value Imputation, Air Quality Data

PDF Full Text Request

Related items

1	Multivariate Time Series Missing Data Imputation Based On TSnet
2	Comparison And Empirical Analysis Of Imputation Methods For Missing Data
3	Research On Time Series Missing Value Imputation Algorithm Based On Dimension And Distribution Forecast
4	Research On Imputation And Forecasting Of Multivariate Time Series Based On Transformer
5	Imputation Methods Of Missing Values For Compositional Data
6	Research Of Time Series Missing Values Imputation Method In Ecological Monitoring Stations
7	Impute Missing Values For Mixed Data
8	Incomplete Data Filled
9	Maximum likelihood estimation and multiple imputation: A Monte Carlo comparison of modern missing data techniques for multilevel data
10	Imputation For Missing Value Of Compositional Data Based On Biclustering Algorithm