Font Size: a A A

Research On Lossless Compression Algorithm For Time Series

Posted on:2022-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2480306563466234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Time series is a string of values obtained by observing certain physical quantities in chronological order,which reflects the characteristics of changes in the attributes of things over time.The compression of time series is basic and important.It not only can reduce the space storage of time series,but also reduces the cost of data transmission.This thesis investigates the lossless compression of time series,designs a lossless compression algorithm for timestamps and two lossless compression algorithms for timeseries data values to achieve high compression rate without distortion of data information.The main work and innovations are as follows:(1)e-DoD timestamp compression algorithm is proposed.The algorithm firstly performs the second-order differencing of UNIX timestamps,and then encodes the smaller absolute value of the second-order differential with fixed variable-length encoding,encodes the larger absolute value of the second-order differential with bit mask method,tail zero method,and offset method,the coding method with the shortest code result is selected to encode the second-order difference value,and the variable-length control bit is used to perform variable-length control storage for all codes.The e-DoD algorithm effectively reduces the storage space overhead of timestamps.(2)Pred Zip time series data value compression algorithm is proposed.The algorithm is divided into two parts: the probability prediction module and the arithmetic coding module.The probability prediction module predicts the conditional probability value of each character in the time series using the first k characters of the character,and the arithmetic coding module uses the conditional probability of the character for arithmetic coding compression.The highest compression rate of Pred Zip algorithm in the experimental data set gets to 9.2.By using three probability prediction models based on logistic regression,LSTM,and XGBoost respectively,the Pred Zip time series data value compression algorithm is compared.The higher the prediction accuracy of the probabilistic prediction model,the higher the compression rate of the algorithm.(3)CS-Zip time series compression algorithm is proposed.The algorithm includes training process and data compression process.During the training process,the optimal data compression data conversion method for a fixed-length data segment is firstly labeled using traversal selective labeling algorithm to form a training set;then,a data conversion method classifier is trained based on the training data set.The data transformation methods include six methods: Delta,Reversed Delta,Delta-of-Delta,Reversed Delta-of-Delta,XOR and Delta XOR.In the data compression process,the time series is evenly divided into data segments,and the data conversion method of the data segment is obtained according to the data conversion method classifier;then the data segment is converted according to the method,and then the coding method with the highest compression rate is selected for each data in the converted data segment.The data encoding methods include Bitmask,Trailing-zero,Offset and Rightward Offset.The CSZip time series compression algorithm selects the compression method with higher compression rate for each segment of the time series,thereby improving the compression rate of the compression of entire time series.
Keywords/Search Tags:Time series, Timestamps, Data compression, Lossless compression, Segmented compression
PDF Full Text Request
Related items