Font Size: a A A

The Improvement Of Time Series Feature Coding And Its Application In Financial Data

Posted on:2019-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X K DongFull Text:PDF
GTID:2370330572455209Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
Time series analysis technology has a very important application in many fields such as science,economy,meteorology,medicine and so on.There are many topics in time series analysis,including system description,dynamic system prediction,and pattern matching.The feature representation of time series is one of the important foundations of time series analysis and one of research hotspots.Time series in various scenarios is a series of numerical values that change with time.This kind of change will inevitably show the system with the accumulation of time.Sexual trends can reflect the laws behind business scenarios.This rule is often too difficult to capture due to too much interference or other incidental factors that are hidden in the sequence.If the characteristics of the time series can be reasonably converted,the noise can be effectively removed,and the system change trend reflected by the sequence itself can be more clearly expressed,then it will have important guiding significance for subsequent classification,prediction,analysis,and decision making.Symbolized feature representation is one of the representation methods of time series features.It can reduce the dimensionality of time series,smooth noise,etc.,and improve the computational complexity and operability of subsequent analysis.It is favored by many scholars.After the transformation of the time series features,the distances,similarities,and prediction methods between the new sequences will change with the subsequent analysis.Based on the symbolic coding representation of time series,this paper improves the traditional symbolic representation method and designs a set of simple codes that characterize the trend features of time series systems.It can objectively and intuitively reflect the systematic changes of time series fragments,and this "numerical" coding method provides a possibility for the prediction of the coding sequence trend.For the problem of coding parameter selection,this paper proposes a parameter selection strategy for balancing complexity and fitness,the HIC criterion.At the same time,in the aspect of similarity measure of symbolic coding of time series,the expected lower bound distance is proposed and the proof of related properties is given,including non-negativity,symmetry,triangular inequality,expected lower boundness and expectation consistency,etc.It solves the defect that the traditional coding distance can hardly satisfy the general definition of the distance.Finally,the article takes financial time series as the data source,and applies twomethods to this algorithm to verify the rationality and applicability of the algorithm.First,the fast matching of typical pattern sequences and the matching of unequal time-series fragments,the experimental results verify that the time-series symbolization coding and coding distance can effectively improve the accuracy and coverage of pattern matching;second,based on financial time The symbolic coding of the sequence predicts the trend,combines the coding algorithm with the regression model,and combines the time series represented by the coding sequence to establish the ARIMA model,predicts the short-term follow-up trend,and verifies the stability of the coding sequence prediction model through comparative analysis.The time-series feature coding representation method proposed in this study and the similarity measure method based on the coding representation are new attempts in the field of time-series big data applications.The algorithm can reduce the computational complexity and has a strong ability to combine with other subsequent analysis algorithms.Therefore,the algorithm has a wide range of applications in production practices,such as similar search and forecast of securities price movements,pattern discovery of meteorological geological data,and abnormal behavior analysis in insurance and medical services.
Keywords/Search Tags:time series, feature form, similarity measure, HIC criterion, expected lower bound
PDF Full Text Request
Related items