Font Size: a A A

A Comparative Study Of Multiple Classification Methods Based On Long Texts Of News

Posted on:2020-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhouFull Text:PDF
GTID:2417330590982851Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Text categorization is a branch of the field of natural language processing(NLP).The research content of this paper is the classification of news long texts in text categorization.This article is based on the Sina News client using the Python toolkit for data crawling.In total,it has climbed more than 20,000 news data in seven categories,including finance,military,tourism,technology,health,entertainment and sports.In the empirical analysis part,the traditional machine will be used.The learning algorithm and the deep learning algorithm are applied to the real problem,and the advantages and disadvantages of different methods for text classification are compared and analyzed.This article is based on the empirical analysis of more than 20,000 news data in seven categories of finance,military,tourism,technology,health,entertainment and sports that Sina News Client has crawled.Firstly,this paper is based on traditional text representation and feature selection methods and traditional machine learning algorithm support vector machine for text classification.Then,the word vector is introduced in the traditional machine learning algorithm,and the word vector and document vector trained by the neural network model are used for text representation.Combining support vector machine(SVM)model for text categorization,using word vector to represent text avoids "dimension explosion" and using text semantics and word order information,so the model classification result is greatly improved;secondly,using convolution Neural network(textCNN)and long-term and short-term memory network(LSTM)for text categorization.The deep learning model has the characteristics of automatically extracting important features compared with the traditional machine learning model,which can avoid complicated and complicated feature engineering.Finally,considering convolutional neural networks Only the local information of the text can be considered,and the cyclic neural network considers the global information of the text but can not reflect the importance of the information.This paper introduces the Attention mechanism in the traditional LSTM model,and the designed model is called Bi-LSTM-Attention model.The model can mention Classification results,the experiment proved the effectiveness of the Attention mechanism.
Keywords/Search Tags:text classification, support vector machine, convolutional neural network, long-short-term memory network, Attention mechanism
PDF Full Text Request
Related items