Font Size: a A A

Research On The Spatiotemporal Models Of Sensor Data Stream And Prediction Methods

Posted on:2023-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Q ChenFull Text:PDF
GTID:1528307025462584Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of sensor technology and the popularity of Internet of Things(Io T)applications,vast volumes of sensor data are produced every day.Therefore,practical sensor data analysis and utilization are among the cores of Io T applications.However,there are many challenges in applying sensor data,such as abnormal data,missing data,high complexity of deep prediction models,and complex spatiotemporal correlation of traffic sensor data.There-fore,in this paper,we focus on the above four challenges and perform spatiotemporal modeling on the sensor data stream in four applications:how to model the spatiotemporal correlation of data streams of nodes in wireless sensor networks and detect multi-type anomalies in sensor data in real-time effectively,how to model the spatiotemporal correlation and attribute correlation of sensor data streams between meteorological sites and estimate the missing air pollution data,how to model the temporal dependence and spatial correlation of sensor data in a long sequence prediction task,and reduce the computation and memory overhead of sensor data stream pre-diction model but maintaining its accuracy,how to model dynamic spatiotemporal correlation of traffic sensor data and improve the prediction accuracy of long sensor data sequences with complex graph structures.The original contributions of this thesis are presented as follows.(1)A hypergrid based adaptive learning method for detecting three types of Data faults(HADF)in sensor data stream has been proposed.This method adopts hypergrid-based and statistical analysis-based methods to detect three types of faults in the sensor data:outliers,stuck-at faults,and noisy faults in real time.HADF is a distributed method that can be deployed on sensor nodes.It combines lazy learning and continuous learning to adapt its normal profile to reduce the influence of concept drift in unstable streaming data.Besides,this thesis constructs labeled datasets by manually inserting different anomalies into two real datasets,and data at-tributes include temperature,humidity,and light.Abundant experimental results demonstrate that HADF achieves higher accuracy with reasonable efficiency for detecting data faults than the four counterpart methods.(2)A missing type-aware interpolation framework(IMA)has been proposed for massive data missing problems in sensor data streams.Taking the data from city-wide environmental monitoring systems that contain many scattered stations for example,IMA considers three as-pects of information,i.e.,spatiotemporal correlation,attribute correlation of a single record,and correlation of all historical data,and accordingly develop three methods to estimate the missing data.First,IMA develops an improved multi-viewer method,which uses the spatiotemporal correlation of data from neighbor stations to estimate random missing values.Second,IMA applies a new Multi e Xtreme Gradient Boosting(Multi-XGBoost)method to learn attribute cor-relation.Third,IMA combines matrix factorization to estimate the large missing parts.Finally,through combining with the above three methods,IMA adaptively selects the appropriate inter-polation method according to the missing data type.This thesis conducts experiments on two real datasets from the Beijing air quality monitoring stations.Each dataset contains six pol-lution indicators(PM2.5,PM10,NO2,CO,O3and SO2)and massive missing measurements.Experimental results show that IMA outperforms other counterpart methods in interpolating the missing measurements in terms of accuracy and effectiveness.Compared with the most related method(ST-MVL),IMA improves the interpolation accuracy from 0.818 to 0.849 in a small dataset and from 0.214 to 0.759 in a large one.(3)A lightweight and efficient neural network called TTFNet has been proposed to re-duce the high computational complexity of sensor data stream prediction model.This method forecasts the long time series using three features(i.e.,the Trend,Temporal attention,and Fre-quency attention)extracted from raw time series.TTFNet performs a pooling operation on the historical data in a recent time window to extract a general trend,uses a multi-layer perceptron(MLP)to discover the temporal correlation between data as temporal attention and applies the fast Fourier transforms(FFT)on data to obtain frequency information as frequency attention.Each feature is separately extracted from its neural network branch with an output,and TTFNet weights the three results to generate the final prediction.The weighted values are learned with the model parameters during the training process.Also,the three prediction results can run in parallel since they are independent.This thesis verifies the performances of TTFNet on three real datasets:electricity-consuming load data,electricity transformer Temperature and weather data.The experimental results show that the proposed method reduces the memory overhead and runtime of the five counterpart methods by 63%and 81%on average while achieving com-parative accuracy.(4)A deep learning model(TGANet)has been proposed to predict data with complex spa-tiotemporal characteristics such as traffic sensor data stream.This model can predict the future traffic state by analyzing the spatiotemporal dynamics between the historical data of traffic nodes in the traffic network.TGANet contains dilated causal convolution,multi-view graph convolu-tion,and masked multi-head attention modules.The dilated causal convolution module is used to mine the long temporal dependency of traffic data.Multi-view graph convolution module adopts non-euclidean metrics like Jaccard distance and Pearson correlation to initialize the weights of adjacent matrices,and sequentially obtains rich semantic information in the traffic data stream.In addition,the masked multi-head attention module is used to mine fine-grained spatiotempo-ral characteristics.The experimental results on two public traffic datasets demonstrate that the prediction accuracy of TGANet outperforms the counterpart methods in most cases.Especially in long-term series(t>40mins)prediction,TGANet outperforms all other methods.
Keywords/Search Tags:Sensor data stream, spatiotemporal correlation, anomaly detection, data interpolation, time-series prediction
PDF Full Text Request
Related items