Research On Text Classification And News Recommendation Algorithm Based On Word Embedding

Posted on:2023-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:P Zhou

Full Text:PDF

GTID:2568306836470014

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet information age,electronic news information has become an important medium for people to obtain information from the outside world.Faced with the massive news data flooded in various social platforms,it is difficult for users to quickly filter out news that meets their own personalized parameters,and the emergence of recommendation algorithms solves this problem.Existing news recommendation algorithms often lack the analysis of news timeliness and news semantic information.This paper proposes a news recommendation algorithm based on word embedding and time factors.The algorithm uses mainstream word embedding technology to process text information,analyzes users’ existing interests,and uses collaborative filtering algorithm to analyze users’ potential interests.Considering the timeliness of news,this paper proposes a news hotness calculation method,which is combined with the recommendation algorithm for recommendation.The experiments on real data sets prove the effectiveness of the method.In addition,for social platforms,how to automatically classify massive news is also a severe task that needs to be faced.This paper analyzes the existing news text classification algorithms,and proposes an improved text classification algorithm.The algorithm mainly combines convolutional neural network and recurrent neural network,and introduces multi-head self-attention mechanism to learn relevant information from different representation subspaces,and conducts experiments on real data sets to analyze its impact on news text classification.In view of the above research,the main results of this paper are embodied in the following three aspects:1.This paper proposes an improved text classification algorithm based on multi-head self-attention.Extracts the features between words through CNN,and the bidirectional GRU is used to extract features from the sequence information of news headlines.For the output processed by CNN and GRU,the multi-head self-attention mechanism is used to learn the representation of the text,and then the model output is mapped to the label dimension through the pooling layer and the fully connected layer.This paper extracts a total of 100,000 news in ten categories from the news dataset,and uses the proposed model for multi-label classification.The results show that the classification accuracy rate is better than the existing improved text classification algorithms based on CNN and RNN.2.This paper proposes a news recommendation algorithm based on word embedding.First,analyze the user’s reading history,and use the TF-IDF algorithm to filter out representative keywords from the news that the user has read.The extracted news keywords are converted into word vectors that can be directly calculated through the BERT model,and then the Euclidean distance between each word vector is calculated and clustered to obtain several interest center vectors of the user.Combining the user’s interest vector and the word vector extracted from the candidate news,a content similarity calculation formula is proposed to measure the similarity between the user’s interest and the candidate news.3.Considering the factors of time,existing interests of users and potential interests of users,this paper proposes a hybrid recommendation algorithm.Due to the timeliness of news,it is necessary to focus on recently published news when making recommendations.Therefore,this paper considers the time factor and proposes a news hotness calculation formula.The initial popularity of news is determined by the number of news readers,the popularity of topic words,and the time of news publication,and a time decay function is introduced to periodically attenuate news popularity,so as to ensure that popular news will not be recommended to users for a long time.In this paper,the news popularity formula is fused with the content similarity formula and the collaborative filtering similarity formula,and verified on the news data set.Compared with other methods,the method in this paper has improved the precision,recall rate and F1 score.

Keywords/Search Tags:

Recommendation Algorithm, BERT Model, Word Embedding, Text Classification, Multi-Head Self-Attention

PDF Full Text Request

Related items

1	Research On Text Emotion Classification Based On BERT Embedding
2	Research On The Classification Method Of Enterprise National Economy Industry Based On BERT Model
3	Research On News Texts Classification Based On Keyword Extraction And BERT Word Embedding
4	Research On Text Classification Algorithm Based On Word Embedding Model
5	Research On News Text Classification Based On Multi-head Attention Mechanism And Feature Fusion
6	Research On Text Multi-Feature Classification Algorithm Based On BERT-LSTM
7	Cross-Lingual Text Classification Based On Monolingual Word Embedding Mapping Without Parallel Corpus
8	Research On Short Text Summarization Generation Method Based On Deep Learning
9	Research On Chinese-korean Cross-lingual Text Classification Method Based On Bilingual Topical Word Embedding Model
10	Research On Text Classification Based On Multi Word Vector Integration And Neural Network