Correlation And Similarity Analysis Of Stock Index Based On Data Mining Methods | | Posted on:2024-06-15 | Degree:Master | Type:Thesis | | Country:China | Candidate:S W Peng | Full Text:PDF | | GTID:2568306920958409 | Subject:Electronic information | | Abstract/Summary: | PDF Full Text Request | | With the development of data and processing technology,the data carrying information are of increasing interest.Not only does the data reflect direct information,but correlations between the data reflect the underlying information-changing patterns of data.How to get the potential information is a higher requirement for data mining.Correlations between time series are important elements in data mining.Stock data is typical of time series data.There are complex interactions in the stock market.Mining interrelationships in stock data is important for revealing the underlying information.Due to the importance of data mining applications in the stock market,this paper has investigated correlations between sectors and individual stocks in the Chinese stock market and similarities in the economic development of provinces in mainland China.Firstly,the correlation between time series has been mined by using sector and stock data.Pearson correlation is widely applied to the study of linear correlation between time series.Pearson correlation also measures whether there is mutual information between time series.Furthermore,transfer entropy is often applied to measure the information transfer between time series.The introduction of time delay in correlation analysis makes one time series lag behind the other.In this situation,the information flow can be transferred from the latter to the former.From this perspective,both Pearson correlation with time lags and transfer entropy can be used as measures of the information flow.A potential link between the Pearson correlation analysis and transfer entropy has been found in the results of the study that Pearson correlation coefficients with time lags have the same trend as transfer entropy in most cases.Higher correlations between stock data provide more information for predicting future trends from one stock to another and lower correlations between stock data provide less information.We have used a long short-term memory model(LSTM)and recurrent neural networks(RNN)to predict stock data and measured the information loss between the original and predicted data.Considering the complexity of transfer entropy calculation,Pearson correlation with time lags is a simple method to quantify the mutual information between stocks.Secondly,this paper has also investigated the similarity of economic development among provincial economic regions in inland China using stock index data.The commonly used similarity analysis methods have been introduced.This paper has measured the uncertainty of the continuous trend structure in the time series and has proposed a modified information entropy for this study.Further,four forms of stock index data have been subject to similarity measures and results have been clustered using a spectral clustering method.The clustering result shows that the economic development of the coastal region and the Yangtze River basin region is highly similar.There is a significant gap between the economic development of the western region and the coastal region but the gap in economic development between the coastal region and the Yangtze river basin region is small.With the passage of time,the economic development of some provinces in the western region gradually get much closer to the eastern region but there is still a gap.Finally,the time series analysis method has been demonstrated through system design.This system contains three modules: correlation analysis module,similarity analysis module,and information prediction and information metric module.The individual modules implement their respective functions.The Django framework is chosen for the design of the system.The Python language is used in the project for the implementation of the back-end algorithm.The front-end interface is implemented using html5.This paper has a strong practical value in the study of the interrelationship between stock data.It has been found that Pearson correlation with time lags is a simple and robust method to quantify the mutual information among stocks.The modified information entropy for this study can analyze the similarity of economic development of each province in space from the information perspective.In addition,the time series analysis system designed in this paper can not only improve the efficiency of data analysis but also display analysis results more intuitively. | | Keywords/Search Tags: | time series, data mining, correlation, similarity, system application | PDF Full Text Request | Related items |
| |
|