Font Size: a A A

Construction And Quantitative Analysis Of Machine Learning Models Based On Financial Data

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhuFull Text:PDF
GTID:2428330602983738Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of China's economy,financial market plays an increasingly important auxiliary role.The development of the financial market itself determines the efficiency of its service to the real economy.The more accurate the market pricing is,the more effective it is to serve the real economy.However,the efficiency of China's financial market is not high,there is still a lot of room for development.There has been an efficient market hypothesis in financial theory,which holds that in a market with many participants in competition,the market is efficient and the competition will eliminate excess earnings,that is to say,in a fully competitive market,it is impossible to predict its trend.Many empirical studies support this conclusion.However,in the study of market microstructure,it is shown that even in a market that is effective in a long period,there may be inefficiency in a short period.Therefore,how to build an effective mathematical statistical model to capture these possible invalid situations has become a research hotspot in financial theoryFirstly,this paper uses random walk test to verify the ineffectiveness of China's stock market and futures market.The data range is from 2014 to 2019,and the frequency is minute data and day data respectively.After empirical analysis,it is found that there are a lot of ineffectiveness in the financial target price series of these two markets in China at present.On the minute frequency,the ineffectiveness of the stock market in China is higher than that of the futures market.On the daily frequency,the difference between them is not big,and the ineffectiveness of the two markets does not weaken with the increase of the year,which provides the basis for the quantitative analysis later.After completing the market ineffectiveness test,according to the size of the data,we propose a Gaussian mixture hidden Markov model(acmghmm)based on autocorrelation coefficient adjustment prediction to model the disk high-frequency data,in order to try to capture the market ineffectiveness.Firstly,the hidden Markov model under mixed Gaussian distribution(hmm-mgd)is used to model the data yield.Based on the microstructure theory of financial market,this paper proposes a feature construction method to measure the trading intention of market participants.According to the characteristics of financial time series,a new auto correlation coefficient adjusted prediction(ACAP)method is proposed to reduce the high volatility of prediction results.This paper tests the tick data of active trading varieties in China's futures market from October 18,2019 to November 1,2019,by using the adjusted mean relative percentage error(amrpe)and the adjusted variance of adjusted relative percentage Error,varpe)measures the accuracy and volatility of prediction error.Comparing acmghmm model with HMM model and hmm-rf model,it is found that compared with HMM and hmm-rf model,amrpe is reduced by 30.4%and 15.4%,varpe is reduced by 65.2%and 52.3%,which shows that acmghmm model can significantly reduce the volatility of prediction and improve the accuracy of prediction.After that,the hypothesis test of the effectiveness of the strategy of the model is carried out,and it is found that the performance of acmghmm model is limited by the small sample size,which is not particularly stable.Most of the return on test is significantly greater than 0,but still a small part can not pass the test.For the case of expandable data,we propose a distributed adaptive minimum prediction interval LSTM model(tdmi-lstm)based on migration learning to model the high frequency disk port data.LSTM needs a large number of training samples to give full play to its advantages.When modeling financial time series,it will encounter difficulties,which are manifested in the following aspects:the financial time series has the characteristics of heteroscedasticity,so the data distribution differences are large,which can not guarantee the quality of training data;the high-frequency tick data will have a large number of invalid samples,which will dramatically reduce the quality of training samples;with the delay of prediction interval Because the prediction interval can not be extended at will to improve the number of effective samples in the training samples.Because of the above problems,LSTM can not directly model the financial time series end-to-end.In view of the above difficulties,this paper proposes the corresponding solutions and algorithms.Aiming at the latter two problems,a minimum prediction interval algorithm(MPI)based on random walk is proposed to determine the most appropriate prediction interval for the training set,increase the proportion of effective samples in the sample,and shorten the calculation time.For the first problem,the hmm-mgd model is used to model the variety data at low frequency,and then the Jensen Shannon divergence is used to measure the similarity of the hidden state distribution of each variety.Because there is no closed form solution for JS divergence of mixed Gaussian distribution,Monte Carlo is used in this paper Carlo)method is used to sample and calculate JS divergence,and then cluster JS divergence to classify high-frequency data.In the prediction,for each out of sample test data,first use the obtained hmm-mgd model to decode and classify,and then input it into the corresponding LSTM model for prediction.After that,by testing the tick data of active trading varieties in China's futures market from October 18,2019 to January 20,2020,the total number of data is about 40 million.Comparing the tdmi-lstm model with the LSTM model,lstm-sc model and cnn-lstm model,it is found that the amrpe of tdmi-lstm model is reduced by 52.4%,46.3%,51.4%,and varpe decreased by 33.8%,38.8%and 43.1%respectively,indicating that tdmi-lstm model can greatly improve the prediction accuracy and reduce the prediction volatility.Then we test the validity of the strategy of the model,and find that the tdmi-lstm model can stably grasp the characteristics of financial time series data,and test the hypothesis of the test results,all the test return are significantly greater than 0.
Keywords/Search Tags:Hidden Markov Chain, LSTM, Transfer Learning, Financial Intelligence
PDF Full Text Request
Related items