Font Size: a A A

Research On Stocks Data Analysis Based On Spark MLlib

Posted on:2020-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:H GaoFull Text:PDF
GTID:2370330578968899Subject:Engineering
Abstract/Summary:PDF Full Text Request
Stock is a major part of a country's economy and society,but stock price is deeply influenced by economic environment,national policy,domestic and foreign environment,so it is difficult to predict the trend of stock and stock price.In addition,It very difficult to analyze the stock data for the randomness of the stock market,the asymmetry of information and the herd mentality of investors.However,the research on financial market and stock analysis has always been the focus of research.In order to analyze the trend of stock price more accurately,this paper proposes a wavelet denoising method for stock trading data,calculates and collects the commonly used technical indicators and emotional factors in stock market,and makes principal component analysis for these factors data,then classifies and analyses them by machine learning after preprocess.After classifying and analyzing stock data with logistic regression,support vector machine and random forest in machine learning,combined voting is carried out by using this predicted results,and a comparative experiment is conducted.The experiment shows that after noise reduction and dimensionality reduction,We can obtain better results through combined voting analysis.Stock price prediction is also one of the highlights of the present study,This paper uses long short-term neural network to predict stock price.In LSTM,sliding time window is used to make short-term prediction of stock price.Then,under the same number of iterations,the experimental results of LSTM with noise reduction and dimension reduction data are compared with those of raw data.Finally,we using the average absolute error,the mean square error,the root mean square error and the percentage of the average absolute error to analyze the error of this network.After that,Spark MLlib distributed learning is applied to experiment these two algorithms in cluster,comparing the performance differences between single-machine environment and cluster environment.Because the dataset itself is small,although it has been improved in time,it cannot fully reflect the advantages of Spark.In the future,with the progress of natural language processing and text analysis and quantification of stocks,when the set is large,Spark's advantages will be further reflected.
Keywords/Search Tags:Stock Analysis, Principal Component Analysis, Combination Voting, Long Short-term Memory Network
PDF Full Text Request
Related items