Font Size: a A A

Research On News Text Classification Method Based On Deep Learnin

Posted on:2024-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:P XuFull Text:PDF
GTID:2568307106982169Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of the Internet era,the speed and efficiency of news dissemination have been constantly accelerating,resulting in massive news texts,which poses a challenge for readers to quickly find the target content.The demand for news text classification is more and more urgent,and it is difficult to meet the demand quickly by traditional manual classification.With the progress of computer technology,the application of artificial intelligence methods to solve news text classification problems has received more and more attention.Traditional artificial intelligence methods focus on extracting effective information representation from news content and classifying it,focusing on using models to establish a mapping relationship from news features to news categories.Usually,the human mind is able to distinguish between categories of news and categories that do not correspond.The non-corresponding category information strengthens the judgment of the corresponding category information and improves the classification accuracy.This provides a new way to put forward a new classification method of news text.After an in-depth study of news text classification and text matching related technologies,this paper transforms the news text classification problem into a text matching problem by using twin neural networks to mine the relationship between news text information and news categories,focusing on solving the problem that news category information is not effectively used.Improving classifier accuracy by making full use of news text information and news category information.The main innovations are as follows:(1)A method of news text classification based on twin neural networks is proposed to address the problem that news category information is not effectively utilized.This method transforms the news text classification problem into a matching problem between news headlines and news categories,making full use of the category information.Firstly,news headlines are paired with news categories in pairs,and then the model is trained by the method of minimizing the difference between category correspondence and maximizing the difference between category non-correspondence,and the difference degree between the neural network prediction news headlines and news categories is calculated.Finally,the difference degree is ranked and the category with the smallest difference corresponds to the news category.The model makes full use of news category and news text information and matching relationship to improve the classification accuracy.(2)For the problem of low classification accuracy of simple combination of news text headlines and news categories,a news text classification method based on the fusion of multiple feature representations is proposed.Firstly,a multi-feature representation fusion framework is constructed,i.e.news headline features,news content first sentence features,news content last sentence features and news content topic features are extracted to form a news text feature group;news classification features and supplementary news classification explanation features are selected to form a news category feature group.The matching of news headlines with news categories is extended to the matching of news text feature groups with news category feature groups.Then,based on the twin network structure,a news text classification model based on multi-feature representation fusion is constructed,which specifically contains a text representation module,an information interaction module,a local inference module,an inference combination module and a prediction module.Specifically,in the text representation module,the news text feature group and the news category feature group are represented vectorically.In the information interaction module,the vector extracted from the attention layer is spliced with the original representation vector,and then the hidden state value is obtained by Bi LSTM.In the local reasoning module,multiply the news word vector and the news category word vector in pairs,dig deep into the relationship between each word in the news text and each word in the news category,and then judge whether the degree of connection between the text and category is high enough from the difference performance,calculate the difference and product between the new sequence and the old sequence.Collect and store all the obtained information set to form news text information sequence and news category information sequence;In the inference combination module,Bi LSTM is used for global analysis,and then average pooling and max pooling are carried out on the sequence.The prediction module uses the multilayer perceptron classifier to get the difference degree and judge the news category.Finally,by combining the Bagging mechanism to optimize the news text classification method,the model generalization ability is further improved and the accuracy of news text classification is enhanced.
Keywords/Search Tags:Text classification, Siamese neural network, News title, Match, Multi-feature representation
PDF Full Text Request
Related items