| At present,more and more people are entering the stock market and trying to obtain wealth through stock trading.However,due to the lack of relevant stock knowledge and experience in investment and financial management,many people have suffered heavy losses in the stock market.In the analysis of securities investment,stock selection and timing are the two issues that people are most concerned about.As the number of listed stocks increases,investors not only need to accurately analyze the potential buying points and selling points of each stock,but also need to make portfolio selection to reduce risks.Through stock clustering,stocks with similar trends in the same time period or different time periods can be obtained,which can provide certain guidance for the construction of investment portfolios.Stock forecasting can find the rule characteristics of market change from stock historical data and provide some guidance for stock trading.Aiming at the characteristics of low SNR,non-linear,non-stationary and non-normal of financial time series,this thesis proposes a stock time series clustering method based on convolutional Autoencoder and K-Means algorithm,in which convolutional Autoencoder is used for feature extraction and dimensionality reduction.In general,portfolio construction requires the selection of stocks in different sectors because the correlation between stocks in the same sector is high.By clustering 100 stocks in different sectors,it is found that there may be some correlation between stocks in different sectors.Therefore,it is not possible to select investment portfolio only according to the sectors.A more scientific method is to cluster stock sequences,and consider adding stocks from different sectors into the portfolio according to the clustering results.Since clustering is unsupervised learning,it is impossible to compare the advantages and disadvantages of different algorithms from internal indicators.Therefore,this thesis selects 25 datasets in the UCR Times Series Archives,and uses the adjusted Rand index and normalized mutual information as evaluation indicators.The experimental results show that the method in this thesis is better.In view of the traditional stock forecasting method accuracy is not high,this thesis proposes a stock prediction method based on GBDT.Based on original data,we add time features(year,month,day),and statistical features(moving average,daily earnings,Bollinger bands).Then we combine LR,XGBoost and lightGBM to predict the rise and fall of stock price,with an accuracy of 57.6%,better than the GRU method,which is the deep learning method.Finally,lightGBM model is used to analyze the importance of different characteristics and enhance the interpretability of the model.Based on the previous work,this thesis builds a stock big data analysis platform based on Flask,which provides the functions of stock visualization,stock clustering and stock forecasting.The whole system takes Hadoop,Spark and MySQL as the backstage data storage and processing platform to speed up the data processing.uWSGI and Nginx provide network services to achieve high concurrency and load balancing.Redis database is for caching,relieving server stress and speeding up access.Finally,the system test is carried out to verify the availability of the system. |