Chinese News Text Classification Combining Keyword Extraction And Attention Mechanism

Posted on:2024-07-14

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2558306920953989

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the rapid rise of Internet technology,the amount of news text data in the network is growing exponentially,and the lack of effective management and utilization of data is becoming increasingly prominent.How to accurately classify the massive Chinese news text data has become a very important work in the field of Chinese natural language processing.Traditional methods based on statistics and machine learning have been unable to perform the classification task on large-scale news text data.However,in recent years,more and more scholars have paid more attention to the application of deep learning methods in natural language processing tasks.But,there are still some problems in the current relevant research methods.Firstly,the sparse feature distribution of Chinese news text data reduces the feature extraction effect;Secondly,the neural network ignores the context information in the feature extraction process.The above problems will attenuate the classification effect of the model.This dissertation mainly focuses on the following work:Aiming at the problem of sparse distribution of Chinese news text features,this dissertation designs a keyword extraction algorithm based on weighted graph model,which solves this problem by processing the original data.Based on the original Text Rank keyword extraction algorithm,the algorithm introduces the similarity between words calculated by fast Text word vector as the weight of the graph to improve the keyword extraction effect.The implementation process of the algorithm is to train the original text data using the fast Text word vector algorithm to obtain the word vector information;Secondly,each word in the text is connected with all words in the fixed window to form a graph,and the word vector is used to calculate the similarity between words as the weight of the graph;Finally,the importance order of words is obtained through iterative calculation.The experimental results show that the accuracy of Chinese keyword extraction algorithm based on weighted graph model is improved compared with other traditional algorithms.Aiming at the problem that the model ignores the context information in the feature extraction process,resulting in low classification results,this dissertation designs a feature extraction network based on time series network and attention mechanism.The model uses the data processed by the keyword extraction algorithm in this dissertation as input,uses the LSTM model to obtain the feature representation of the text data at every moment and input it into the self-attention mechanism.Through this mechanism,the feature representation combined with the context information is obtained,and the feature of every moment is integrated into the text feature and output,finally completing the text classification.The experimental results show that the Chinese news text classification model based on keyword extraction and attention mechanism in this dissertation has significantly improved in precision,recall,F1-score and accuracy compared with other models.

Keywords/Search Tags:

text classification, keyword extraction, attention mechanism, long short-term memory network

PDF Full Text Request

Related items

1	Research Of Online Comment Text Sentiment Classification Based On Long-short Term Memory Network
2	Research On Relation Classification Via Bidirectional Long Short-Term Memory Networks With Attention Mechanism
3	Text Sentiment Classification Based On Attention Mechanism
4	Short Text Sentiment Classification Based On Deep Learning
5	Research On Chinese Event Extraction Via Incorporating Attention Mechanism And Long Short-Term Memory Networks
6	Research On Text Classification Method Combining Attention Mechanism And Bi-GRU
7	Research On News Text Classification Method Based On Hybrid Model
8	Research On Chinese Relation Extraction For Complex Text Structure
9	Bi-LSTM Short Text Emotion Analysis Combining Semantic And Self-attention Mechanism
10	Text Classification Research Based On Deep Neural Network And Attention Mechanism