Font Size: a A A

Chinese News Text Classification Combining Keyword Extraction And Attention Mechanism

Posted on:2024-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2558306920953989Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid rise of Internet technology,the amount of news text data in the network is growing exponentially,and the lack of effective management and utilization of data is becoming increasingly prominent.How to accurately classify the massive Chinese news text data has become a very important work in the field of Chinese natural language processing.Traditional methods based on statistics and machine learning have been unable to perform the classification task on large-scale news text data.However,in recent years,more and more scholars have paid more attention to the application of deep learning methods in natural language processing tasks.But,there are still some problems in the current relevant research methods.Firstly,the sparse feature distribution of Chinese news text data reduces the feature extraction effect;Secondly,the neural network ignores the context information in the feature extraction process.The above problems will attenuate the classification effect of the model.This dissertation mainly focuses on the following work:Aiming at the problem of sparse distribution of Chinese news text features,this dissertation designs a keyword extraction algorithm based on weighted graph model,which solves this problem by processing the original data.Based on the original Text Rank keyword extraction algorithm,the algorithm introduces the similarity between words calculated by fast Text word vector as the weight of the graph to improve the keyword extraction effect.The implementation process of the algorithm is to train the original text data using the fast Text word vector algorithm to obtain the word vector information;Secondly,each word in the text is connected with all words in the fixed window to form a graph,and the word vector is used to calculate the similarity between words as the weight of the graph;Finally,the importance order of words is obtained through iterative calculation.The experimental results show that the accuracy of Chinese keyword extraction algorithm based on weighted graph model is improved compared with other traditional algorithms.Aiming at the problem that the model ignores the context information in the feature extraction process,resulting in low classification results,this dissertation designs a feature extraction network based on time series network and attention mechanism.The model uses the data processed by the keyword extraction algorithm in this dissertation as input,uses the LSTM model to obtain the feature representation of the text data at every moment and input it into the self-attention mechanism.Through this mechanism,the feature representation combined with the context information is obtained,and the feature of every moment is integrated into the text feature and output,finally completing the text classification.The experimental results show that the Chinese news text classification model based on keyword extraction and attention mechanism in this dissertation has significantly improved in precision,recall,F1-score and accuracy compared with other models.
Keywords/Search Tags:text classification, keyword extraction, attention mechanism, long short-term memory network
PDF Full Text Request
Related items