Font Size: a A A

Research And Application Of Data Cleaning And Repairing Methods In Production Process

Posted on:2024-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:J R PanFull Text:PDF
GTID:2568307112458174Subject:Computer technology
Abstract/Summary:PDF Full Text Request
For decades,each industry of our country develops rapidly,resulting in a large amount of data,including the time series data.Time series data usually has the characteristics of high dimension and large amount of data,which brings higher difficulty and greater challenge to data cleaning and repair.Therefore,in order to improve the accuracy of high-dimensional time series,this paper studies the cleaning algorithm of high-dimensional time series.The main research contents are as follows:(1)Aiming at the errors that may occur at one or several points in the time series,a cleaning algorithm based on speed constraints is proposed.On the basis of limiting the rate of change of time series,the local optimal linear time algorithm and the median rule are used for online calculation.The algorithm can also support stream cleaning and adjust the window size adaptively to the data arriving out of order.(2)Anomaly detection is carried out for high-dimensional time series with spatial correlation.This paper proposes an efficient cleaning framework based on spatial property.The framework first preprocesses the data,combines the prior knowledge and the magnitude of correlation to judge whether there is a sequence error,and then performs the cleaning and repair algorithm on the sequence where the error may occur.This framework can be combined with velocity constraint algorithm and generative adversarial network algorithm.Experiments show that the accuracy and recall rate are better when the proportion of abnormal sequences is not high.(3)On the basis of correlation,the generated adductive network is used to clean the large area error that may exist in the high-dimensional time series.The generated adductive network is composed of two LSTM-RNN.We verify the effect on the data set generated in the actual process of tobacco loose backdamp,and compare it with the other two real data sets and the algorithm mentioned above.Experiments show that the algorithm has good performance.
Keywords/Search Tags:Data Cleaning, High Dimensional Time Series, Speed Constraint, Generating Adversarial Network
PDF Full Text Request
Related items