Research On Short Text Classification Of Chinese News Based On Machine Learning

Posted on:2023-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:B L Zhang

Full Text:PDF

GTID:2568306806479154

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the increasing amount of text data,text classification is becoming more and more important.An automatic text classification technique is needed to classify text data,and extract valuable information from text data.In this research,we take Chinese news short text classification research as an example to complete the research work on Chinese short text classification.Short texts have less vocabulary and sparse features,so we cannot classify them according to conventional text classification methods.In order to increase the number of features of short texts and improve the accuracy of text classification,we present a short text feature extension method based on Latent Dirichlet Allocation(LDA)model and Text Rank algorithm.The main work and research results include:(1)We analyze in detail the advantages and disadvantages of Naive Bayes algorithm,Support Vector Model(SVM),K-Nearest Neighbor(KNN),Decision Tree algorithm and Logistic Regression algorithm.We use text classification techniques for data cleaning,word segmentation and feature processing of texts.We used the above five machine learning algorithms to conduct Chinese news short text classification experiments,and compared the classification results.(2)We present a short text feature extension method based on LDA model and Text Rank algorithm to solve the problem of sparse short text features.We first use the LDA model to obtain the hidden topic features of each text,then use the Text Rank algorithm to obtain the keywords of the text,and finally expand the keywords corresponding to the hidden topic features of the text into the short text as feature expansion words.Our method can increase the number of features in short texts,adding more effective information for subsequent text classification.(3)We take Chinese news short text classification as an example to conduct related research on Chinese short text classification,and improve the text classification method from the extraction and expansion of feature words.We use the Naive Bayes algorithm,SVM algorithm,KNN algorithm,Decision Tree algorithm and Logistic Regression algorithm to verify the improved method proposed in this thesis.We also used the Word2 Vec model to conduct verification experiments on the THUCNews dataset,in order to further verify the effectiveness of the method.The results show that this method can improve the accuracy of text classification and effectively improve the effect of text classification.We use the feature expansion method to expand the features of short texts,which can increase the number of text features and effectively alleviate the problem of sparse features of short texts.It has important research significance for realizing the correct classification of short texts.

Keywords/Search Tags:

Chinese short text, Text classification, Feature words extraction, Feature extension, LDA model, TextRank algorithm

PDF Full Text Request

Related items

1	Design And Implementation Of Chinese Short Text Classification Method
2	Short Text Classification Based On Integration Of Ontology And BTM Feature Extension
3	Research On Classification Method On Chinese Short Texts With Few Words Based On Feature Representation
4	Research On Short Text Classification Method Based On Feature Extension
5	Short Text Classification Based On Feature Extension
6	Feature Extension Methodfor Short-text Classification Based On LDA
7	Research On Short Text Classification Technology Based On LDA Feature Extension
8	Research On Short Text Data Stream Classification Based On Feature Extension And Selection
9	Research On Short Text Classification
10	Short Text Classification Algorithm Of Deep-learning Based On Feature Extension