| With the rapid development of Internet technologies,all kinds of network social media are also emerging.The public can also use these platforms to transform from a receiver to a spreader for information.Therefore,online public opinions are increasingly valued by the government and enterprises.However,it is difficult for researchers to process a large amount of public opinion data correctly and quickly,due to the sparse nature of social media texts.In this paper,taking "Kong Yiji Literature" event as an example,the co-occurrence network was constructed by crawling the relevant micro-blog data within 48 hours after the event on the Sina Micro-blog trending.Based on key nodes and key edges of text co-occurrence network,a backbone network extraction method was proposed to overtake the sparsity of short texts on social media.And community detection algorithm was used for the backbone network to analyze the public opinion topic.The specific work of this paper is as follows:Firstly,TF-IDF algorithm was used to sort the words after segmenting the words and deleting the Stop Words of the blog data.Since the distribution of TF-IDF values conforms to the heavy tail distribution,the preliminary selection of keywords was conducted by combining the head-tail breaks and 80-20 rule.After obtaining 583 keywords,the text co-occurrence network was constructed by using the co-occurrence relationship between words in the text data set.Secondly,three algorithms based on modularity optimization were selected for community detection.And it was found that Louvain algorithm not only had the lowest time complexity,but also had the highest modularity and the best community division result.However,due to the sparsity of short texts of social media,network nodes are closely connected with each other,and the network is in the form of a dense group,which makes it difficult for the original network to achieve the ideal community division.Therefore,based on key nodes and key edges,a backbone network extraction method was proposed in this paper.In the extraction of key nodes,ten node importance evaluation indexes were selected comprehensively,such as centrality,clustering coefficient,Page Rank value and Kshell value,etc.PCA algorithm was adopted to reduce the dimension of the indexes.The TOPSIS algorithm was used to calculate the score of the index principal component of the nodes,and the key nodes were selected by ranking the nodes from high score to low score.Then,to deal with the redundant edges of the network.The edge betweenness centrality was selected to sort the edges and the maximum connected subgraph ratio was used to reflect the network integrity.The network integrity and modularity were integrated to preserve the edges,so as to ensure the relative integrity of the network and achieve the ideal result of community division.Finally,the backbone network was obtained from key nodes and key edges.The Louvain algorithm was used for community detection,six communities were divided and the modularity was increased by more than 0.2.Combined with the importance of nodes,the community was manually summarized to get the public opinion theme of each community.Using the above method to divide the content community of the blog data in different time periods and analyzed the evolution process of public opinion of this event.As for the social phenomenon reflected by this topic,some suggestions for the follow-up work from the perspectives of society,universities,education and individuals were put forward in this paper.The method proposed in this paper for community detection after extracting the backbone network of text co-occurrence network can process text data quickly and effectively,identify public opinion topics,and master the evolution of public opinion while dealing with a large number of sparse texts in social media. |