| Text classification is a key technology in the field of natural language processing.It is one of the current hot research fields and it is also a research difficulty.On the one hand,there is relatively little information on Chinese text classification in the corpus and on the other hand,Chinese is relatively complex and difficult to recognize compared to English.Therefore,it is difficult to extract features using traditional methods and machine learning algorithms require manual feature extraction for data classification.Compared to machine learning,deep learning can simplify the part of feature extraction and solve the problem of high-dimensional and sparse matrices,improving the accuracy of text classification.The text is mainly analyzed and compared on the news text dataset by integrating several main models in deep learning.The specific work is as follows:1.A network model based on LSTM-CNN-Attention is proposed to solve the problem that traditional convolutional neural network and short-term memory network can not extract text features well.Firstly,the Word2 vector model is used to obtain the word vector representation of the text in structure.Secondly,the LSTM network is used to extract the context information of the full text.Then the LSTM output and the original output are combined to obtain new features.Finally,the multi-channel CNN-Attention structure is used to extract local features.The experimental results show that the classification effect of this model is better,and the accuracy rate is 90.3%,the accuracy is 87.1%,the recall rate is 87.5%,and the F1 value is 86.7% on the Netease news dataset.Compared with other models,this model has improved in four indicators.2.In view of the limitations of the Long Short-Term Memory in extracting local information in text classification,a text classification model integrating LSTM-Attention and CNN is proposed.In terms of structure,LSTM is first used to extract global sequence information,and then the weight is added to the output of LSTM through attention mechanism,and then the local information of the original text is extracted through three-layer convolution neural network.In addition,the convolution neural network adopts a serial structure and selectively fuses the original input information with the output of CNN,Finally,combine the output information of the two to get new features and use softmax to get the probability of each category.Finally,the accuracy rate on the Thucnews dataset reached 96.8%,accuracy 96.8%,recall 96.7%,F1 value 96.9%. |