Font Size: a A A

Research On Topic Extraction In Online Public Opinion Based On Multi-label Classification

Posted on:2019-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2427330626451950Subject:Business Administration
Abstract/Summary:PDF Full Text Request
With the development of Internet and the popularization of intelligent devices,the influence of online public opinion is growing.Enterprises and government agencies also pay more and more attention to the application and management of online public opinion.In the application and management of online public opinion,the first task is to extract key information from public opinion data,which is also called topic extraction.Current topic extraction methods are mainly based on probabilistic topic model,using the probability distribution between topic and term,term and text to extract text topic.However,the probabilistic topic model does not fully consider the semantic relevance between terms and topics in the text.This paper uses machine learning to extract topics in online public opinion,and defines the topic extraction problem as a multi-label classification problem of text topic(text category).In terms of similarity measurement of text data,this paper proposes a method of text semantic similarity calculation based on Baidu Baike annotation information.Firstly,the text is preprocessed by word segmentation and some other process.Then,the improved TF-IDF method is applied to calculate the weight of words in Baidu Baike entries corresponding to the terms.The entries are transformed into weight vectors of words,and the similarity between the entries is calculated by cosine similarity.Finally,text similarity is calculated by similarity matrix based on the similarity values between terms.The experimental results on Words-240 data set show that the text semantic similarity based on Baidu Baike annotation information is highly correlated with the results of manual tagging.In the multi-label classification of text data,this paper designs a multi-label classification method based on label relationship for Kernel Extreme Learning Machine.This method learns the positive and negative relationships among labels according to the co-occurrence and non-co-occurrence distribution among labels.Then label relationships are used to optimize the classification prediction results of the Kernel Extreme Learning Machine.In order to verify the validity of this method,experiments are carried out on some real-world data sets,i.e.Zhihu,Yeast,Image,Scene,Emotions,and Cal500.The experimental results show that the multi-label classification algorithm of Kernel Extreme Learning Machine based on label relationship is superior to other comparison methods in accuracy,precision,recall rate and F1 index.
Keywords/Search Tags:Online public opinion, Topic extraction, Semantic analysis, Multi-label classification, Kernel Extreme Learning Machine
PDF Full Text Request
Related items