Chinese Short Text Classification Based On Convolutional Neural Network Combined With Word Vector

Posted on:2020-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y M He

Full Text:PDF

GTID:2428330572485650

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Text categorization is a key technology for text information processing in the field of natural language processing.It is mainly composed of text representation and classification model(algorithm).In today's era of rapid growth of textual information,text categorization plays a major role in effectively,conveniently and quickly obtaining the information needed.As one of the main carriers of text information,short text has the characteristics of short length,feature sparsity,dynamic,real-time,and irregular format.Therefore,traditional machine learning algorithms based on word bag feature or vector space cannot effectively extract short text.Features,which in turn affect the classification effect.In recent years,the use of the deep feature learning model's powerful feature extraction ability for text categorization has become a research hotspot.Based on the convolutional neural network model and the text representation method of word vector,this paper studies the related technical points of Chinese short text classification,and the related research results are as follows:1.Proposes a word vector model applied to convolutional neural network text classification.Text feature extraction(text input representation)is the main point of text classification technology,and its construction quality directly affects the classification effect of the classification system.Nowadays,the most popular text input representation-Word Vector considers the relevance and similarity between words,but ignores the contextual word order features,and in some cases causes the semantic loss and distortion of the text.To this end,this paper proposes a word vector model WordNGVec that combines N-Gram features with Word2 vec,and extracts the word vector(Word-NG vector)as a two-channel convolutional neural network model(DC-CNN).Input.After several sets of comparative experiments,it is shown that the proposed method can effectively improve the effect of text classification under the three evaluation indexes of precision and recall and F1.2.Proposes a text classification model based on regularized hierarchical Softmax convolutional neural network.The output layer of the traditional convolutional neural network classification model(CNN)adopts the standard Softmax of the flat architecture.In the text classification task with large amount of data and many categories,the computational complexity is high and the training takes a long time.The improved algorithm based on huffman tree,Hierarchical Softmax(H-Softmax),can greatly improve the training speed.However,due to the addition of a large number of node parameters,the optimization difficulty increases,and the optimization requires longer iteration.Steps,and easy to overfit,which in turn affects the model's fitting speed and classification effect.To this end,this paper proposes an improved algorithm model RHS-CNN(Regularization Hierarchical Softmax CNN),using the regularization method to constrain the node parameters of H-Softmax,avoiding over-fitting and enhancing the generalization ability of the model.The experimental analysis shows that the proposed method has a certain improvement on Softmax and H-Softmax in the corresponding evaluation indicators.

Keywords/Search Tags:

text classification, text representation, word vector, convolutional neural network

PDF Full Text Request

Related items

1	Research On Improvement Of Chi-square Feature Selection And Word Vector Text Representation For News Classification
2	Research On Text Classification Based On Word Vector And Deep Learning
3	Research And Implementation Of Text Sentiment Analysis System Based On Neural Network Model
4	Short Text Classification Based On Multi-granularity Feature Representation And Recurrent Convolutional Neural Network
5	Research On Text Classification Method Based On Convolutional Neural Network
6	Research On Text Classification Model Based On Deep Neural Network
7	A Research On Text Vector Representation Based On Semantics
8	Text Classification Based On Convolutional Neural Network
9	Research On Text Classification Based On Word Sense Disambiguation And Convolutional Neural Network
10	Research On Chinese Text Classification Based On Convolutional Neural Networks