Research On Text Classification Based On Word Vector And Deep Learning

Posted on:2024-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wan

Full Text:PDF

GTID:2568307154998769

Subject:Master of Electronic Information (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Text classification is a fundamental and important task in the field of natural language processing,which is widely used in news recommendation,search engine,spam detection,sentiment analysis and so on.In recent years,experts at home and abroad have achieved fruitful results in classifying long texts,but progress in classifying short texts has been slow.In this thesis,we propose the following two innovations to improve the classification of new words or words with new meanings after adding quotation marks(collectively called "noisy words" in this thesis)and the problem that text convolutional neural networks do not capture features comprehensively in short texts.First,because the news headlines are readable,there are other important words in the news headlines that can play a role in explaining the "noise words",then find the words that are important to the classification results to replace the "noise words",so as to reduce the interference of the "noise words" to the classification.The TF-IDF algorithm is an algorithm to measure the importance of words to classification results,but the traditional TF-IDF algorithm does not consider the distribution of feature words between classes,nor does it consider the distribution of feature words on each text within the class.In this thesis,we propose the TF-IDF-BI algorithm.For the convenience of illustration,we propose "betweenclass factor" and "intra-class factor" to measure the importance of feature words on text classification when they are between classes and within classes respectively,and derive the corresponding calculation formula,and finally the "between-class factor " and "intra-class factor" are incorporated into the TF-IDF algorithm to form the TF-IDF-BI algorithm.Second,aiming at the problem that the traditional text convolutional neural network can only consider local features while ignoring global features in the short title text,this thesis proposes the Pre Info CNN neural network model on the basis of Text CNN,which uses the output of Long Short-Term Memory Network at each moment to form the semantic information matrix of the previous text.Then,the above semantic information matrix is fused with the convolution result,so that the model can also capture the above semantic information during the convolution operation,and the K-MAX pooling is used to replace the maximum pooling,so that the model retains more semantic features.Finally,the model is sent into the softmax function for classification,and the classification result is obtained.Experiments show that the improved TF-IDF-BI algorithm is 1.26% higher than the traditional TF-IDF algorithm in the final classification accuracy.In terms of model improvement,the Pre Info CNN model proposed in this thesis is 1.24% higher than the Text CNN model,thus verifying that the two innovations proposed in this thesis can effectively improve the accuracy of text classification.By applying the above research results to the text classification system of news headlines,the improved word vector and Pre Info CNN model can classify short text of news headlines more accurately,which can better guide journalists to classify news texts.

Keywords/Search Tags:

Convolutional Neural Network, Long Short-Term Memory Network, Word Vector, TF-IDF, Text Classification

PDF Full Text Request

Related items

1	Short Text Sentiment Classification Based On Deep Learning
2	Research On Text Classification Based On Word Sense Disambiguation And Convolutional Neural Network
3	Research On Chinese Text Classification Method Based On Long And Short Term Memory Network
4	Research And Implementation Of Multilingual Text Classification System Based On Deep Learning
5	Text Classification Research Based On Deep Neural Network And Attention Mechanism
6	Sentiment Analysis Of Short Text Based On Improved Bidirectional LSTM Neural Network
7	Research On Text Classification Based On Deep Learning
8	Phishing Websites Detection Using Selected Features Classification And Bidirectional Long Short-Term Memory Neural Networks
9	Research On Sentiment Analysis Method And Application Based On Short Text Classification
10	Research Of Online Comment Text Sentiment Classification Based On Long-short Term Memory Network