Research On Short Text Classification Based On Deep Neural Network

Posted on:2022-05-11

Degree:Master

Type:Thesis

Country:China

Candidate:S B Wang

Full Text:PDF

GTID:2518306509988969

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Internet technology has developed rapidly in recent years.Every major platform of the internet,such as Alibaba,Meituan life,Tiktok,and Headlines,etc.generates massive data every day.Text data information is one of them.With the help of big data analysis technology,mining the valuable information contained in it not only brings huge profits to the company,but also plays a very important role in social management,national security and other fields.Text classification is to extract the content features of an article through some model and automatically assign them to the defined categories.Text data is unstructured data,how to quantify the unstructured text data,extract its features and accurately classify it has become one of the most basic tasks in the field of artificial intelligence.With the rapid development of deep learning technology,deep neural network model has shown very good performances in the field of natural language processing.For text classification,this paper mainly studies some aspects of text representation,feature extraction,text classification model selection and integration implementation,and tries to improve some links..For unstructured text representation,common methods include bag of words(BOW),word frequency model(TF)and document inverse frequency(IDF).These methods can represent the relationship between the word and the whole text,but they will lose the information carried by the word itself.VSM has a good effect on long text processing,but the sparsity and irregularity of short text make the effect of VSM not ideal.Compared with VSM,Word2 Vec solves the problem of high dimension and sparsity in traditional text representation model.But it does not reflect the importance of feature words in the text.TF-IDF emphasizes the importance of small frequency words in the whole document library,but it does not take into account the information of uneven distribution of feature words between classes and uneven distribution of feature words in different articles within the same class.Therefore,it will have an adverse impact on the classification results.In this paper,based on TF-IDF,the introduction of class weighting factor and intra-class weighting factor can effectively solve the above problems and a new text representation method is formed by combining Word2 Vec model with improved TF-IDF weight.Convolutional neural network(CNN)and bi-directional long-term and short-term memory network(Bi LSTM)are classic deep neural networks for feature extraction,which have shown excellent performances in text processing and computer vision.However,CNN is more inclined to extract the local features of the text data,and can not capture the context features of the text data very well.LSTM is an improvement of the recurrent neural network(RNN),which solves the problem of gradient disappearance and gradient explosion.It has memory function,but its network design can only extract the above features of the text data,ignoring the below features of the text data.Bi LSTM is a clever combination of a forward LSTM and a reverse LSTM,which can effectively extract the context global feature information of the text data.However,Bi LSTM also has disadvantages in extracting local features of text data.Therefore,this paper combines Bi LSTM and CNN,with complementary advantages and disadvantages,which can more fully extract the feature information of the text data.In this paper,the TDFMIX model is constructed by improving the text representation and integrating the classification models.The comparison of different models on multiple corpora shows that the TDFMIX model improves the text classification performance.

Keywords/Search Tags:

Text Classification, Improved TF-IDF Model, Feature Extraction, Convolution Neural Network, Bi LSTM

PDF Full Text Request

Related items

1	Research On Text Classification Method Based On Bidirectional LSTM
2	Design And Implementation Of Text Classification Model Based On The Improved TF-IDF Feature Extraction
3	Research On Improved TF-IDF Feature Selection And Short Text Classification Algorithm
4	Research And Implementation Of Chinese Long Text Classification Algorithm Based On Deep Learning
5	Research On Text Classification Based On Improved Graph Convolution Neural Network
6	Research And Implementation Of Text Classification Algorithm Based On Three-way Decision And Convolution Neural Network
7	Research On Text Emotion Analysis Based On BiTCN And Pre-training
8	Text Classification Research Based On Improved PCA-SOM Neural Network
9	Text Classification Based On Attention-Based LSTM Model
10	Research On Text Classification Based On Attention Bi-LSTM