| Entering the era of mobile Internet,social media is booming,such as Twitter,Facebook,Weibo,Zhihu,etc.As a rising star,WeChat has nearly 1 billion monthly users.According to statistics,the number of official accounts on WeChat public platform has reached 20 million+,with an average output of 107 million pieces of content per month,becoming one of the main places for information dissemination and public opinion fermentation.Comprehensive and efficient collection and analysis of WeChat data has important applications in hot topic discovery,real-time tracking of emergencies,and public opinion monitoring.To address the issues of strict WeChat API request rate limit,low interface openness,and incomplete data collection methods that rely on web crawler of third-party websites such as Sogou WeChat,this paper designs and implements a mobile data collection system.The system migrates data collection to the mobile terminal,uses the automated test framework to simulate the normal user's clicks,browses and other requests to the server in the application,realizing the collection of all historical information of an official account and full-dimensional data collection of an article including texts,user comments and the amount of likes etc.Further,when conducting topic detection and evolution analysis on the collected data,we propose a topic detection method based on denoising and a topic evolution method based on enhanced fonts.The articles of an official account contain hot news and a large number of non-hot news.If the clustering algorithm is directly used to cluster,it is easily affected by outliers(non-hot news)and the clustering effect is poor.Since the data collected is comprehensive,this paper proposes a multi-dimensional effective report detection method to remove noise,on the one hand to improve the clustering effect,on the other hand to reduce the cost of clustering,especially the massive data in the era of big data.Secondly,considering that even the news official accounts are usually released non-event reports such as soft text and advertisement placement,the existing news topic detection based on the title and introduction of the articles is no longer applicable to WeChat official account.According to the characteristics of microtext typesetting,this paper proposes a topic evolution method based on enhanced fonts.Experimental results show that this method is superior to traditional methods as a whole. |