Font Size: a A A

A BO_SVR Method Based On News Headlines On Soybean Price Forecasting Study

Posted on:2024-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y K WangFull Text:PDF
GTID:2568307121494914Subject:Agricultural engineering and information technology
Abstract/Summary:PDF Full Text Request
People take food as the sky,and agricultural products occupy an important position in the stable development of society;Its price affects social security,and people’s happiness and wellbeing.However,in recent years,the market prices of agricultural products have fluctuated extremely frequently,causing great trouble to the people’s livelihood and the management of agricultural products.At the same time,due to the wide variety of agricultural products in China,the amount of data related to agricultural products is large and the data types are large;It becomes very difficult to use agricultural data information to predict the price trend in the future for a short or long period of time.Therefore,the effective prediction of the price of agricultural products can provide a scientific basis for the government to formulate economic policies and regulate the prices of agricultural products,and it is also of certain practical significance for the development of stable production in the agricultural product market.Taking the soybean news headline of Agricultural Information Network and the soybean price of Dalian Commodity Exchange as examples,this paper uses BERT pre-training,BERTopic topic modeling,BERT-Bi LSTM sentiment analysis and Bayesian hyperparameter optimization SVR technology to establish a soybean price combination prediction model.The specific work is as follows:(1)Pre-training of news headline text based on BERT model.Soy news headline data is sparse and short,and preprocessing in natural language processing is a fundamental step in text mining,including word tokenization,stopword filtering,and word embedding.The purpose of the first two steps is to convert the text into a collection of words after removing unimportant words.In short,word embedding is a dimensionality reduction technique that maps high- dimensional words(unstructured information)to low-dimensional numerical vectors(structured information).Word embedding is designed to convert a document into a mathematical representation as computer-readable input and is therefore critical for text analysis problems.The BERT model is a stacked of multiple encoders in the transformer model,and the purpose of unsupervised training is achieved by predicting 15% of the MASK.In studies on the effect of training set size,in experiments on a series of tasks,the performance of non-context embeddings (Glo Ve,random)increases rapidly as the amount of training data increases,typically reaching 5% to 10% accuracy of BERT embeddings when using the full training set.For many tasks,these embeddings may match BERT given enough data.(2)BERTopic based topic modeling,as opposed to traditional topic modeling models,LDA and Top2vc;LDA uses bag-of-words to represent text,ignoring the order of words and deep semantics,and has limited representational capabilities;Top2vc clustering chooses HDBSCAN,a density-based clustering method,but when finding the topic vector,it is from the perspective of centroid-based,that is,the theme vector is obtained by averaging the vector under the same cluster,which will lead to the inaccurate topic vector obtained,resulting in inaccurate topic representation.BERTopic uses a pre-trained model,using BERT word embedding,C-TF-IDF clustering,and can also explain the topic and retain important words in the topic.(3)Based on the sentiment analysis model of BERT+ two-way long-term short-term memory Bi-LSTM,influencing factors and emotional tendencies are usually mentioned in news headlines.These sentimental tendencies can be seen as the market’s judgment on how the influencing factors affect the price of agricultural futures.The sentiment tendency in this article refers to the influence of influencing factors on the direction and strength of agricultural futures prices,a positive trend means that agricultural futures prices rise,and the greater the value of the sentiment trend,the greater the price increase.Cut words in the text,remove stop words,extract positive and negative keywords in keywords,and calculate the sentiment score.In this paper,the BERT+ bidirectional long-term short-term memory Bi-LSTM model was used for sentiment analysis.Bi-LSTM is a classical deep learning model suitable for processing large-scale text data.It is characterized by the ability to extract information related to the order of words in a sentence,taking into account the long contextual dependencies between words.(4)SVR(BO_SVR)prediction is optimized based on Bayesian hyperparameters,and compared with the traditional prediction model random forest RF and XGBOOST.
Keywords/Search Tags:Agricultural product price forecast, Topic modeling, Natural language processing, Sentiment analysis
PDF Full Text Request
Related items