| There is a large number of individual investors in China’s capital market,which makes the change of stock price vulnerable to investor sentiment.Investor sentiment is the emotional attitude obtained by investors after comprehensive analysis of a series of information such as national policies,market environment and corporate profitability.Investors’ decision-making will change with the change of investor sentiment,and then affect the stock price.Anta sports,as the leader of domestic sports brands,occupies an important position in domestic sports stocks.This paper is committed to incorporating investor sentiment into the stock price prediction index,and using modern machine learning algorithm to predict the stock price of Anta sports.Firstly,through the web page information crawling technology of R language,this paper crawls the stock data of Anta Sports in the Oriental Wealth website for a total of 741 trading days from 2019 to 2021.The extracted indicators include opening price,rise and fall rate,deviation rate and energy tide.At the same time,we crawled the data of investors’ speeches on each trading day of Anta Sports from 2019 to 2021 from Dongfang fortune stock bar,and crawled a total of 1321 stock bar text data.The indicators contained in the text data include reading volume,number of comments,content of comments,author and time.Then,this paper processes the extracted data.For numerical data,in order to get rid of the influence of dimension on the prediction results,the data are standardized,and the dimension is reduced by using the principal component analysis method.Four principal components are extracted to represent the stock price information in the original data.For text-based data,in order to better extract investor sentiment,we use seven basic machine learning classifiers to classify original texts,separate investors’ original texts from non original texts such as information and announcements,and then eliminate non original texts.Then regularize and segment the original text,and draw a word cloud to intuitively show the distribution of shareholders’ positive and negative emotional words.Then match the emotional dictionary,add up the matching results of the original text and the emotional dictionary horizontally,and get the emotional score of investors on each trading day,so as to complete the extraction and quantification of investor sentiment.Finally,this paper uses modern machine learning methods to predict the stock price.The extracted investor sentiment is incorporated into the analysis model as a factor affecting the stock price,and PCA-SVR model,pca-rf model and PCA xgboost model are used for modeling respectively.The optimized R-square values of the three models are 93.8%,92.6%and 95.2%respectively,and the MSE values are 0.28,0.29 and 0.20 respectively.Combined with the data back substitution comparison diagram of the model,it can be explained that the three models can accurately predict the stock price,which proves the effectiveness of the model.In addition,by comparing the prediction effects of the three algorithms,it can be found that the prediction effect of PCA xgboost model is the best,because the addition of regular term in its cost function can reduce the variance of the model,effectively avoid over fitting,and the running speed of PCA xgboost model is fast.Therefore,on the whole,PCA xgboost model is an excellent algorithm for the stock price prediction of Anta sports.It is worth noting that after the three models are included in the investor sentiment index,the R-square value is significantly increased and the MSE value is reduced,which further verified that investor sentiment is an important factor in stock price prediction. |