| In this era of data explosion and huge amount of information,human beings are making large amounts of data all the time.People connect with mobile phones and computers,discuss topics through forums and microblogs,and exchange information in real time through QQ and WeChat.Whether it is the recent major events in the world or some small things in the family,they are quickly spread and exchanged online.How can the huge data generated by human activities be used for us,and what laws can be found? This is exactly what we have to do.We need to analyze these text data,then we need to use natural language processing,so natural language processing is also one of the hotspots of today's research.Recently,the number of Sina Weibo users has grown very rapidly,and has far surpassed a series of forums such as Post Bar.People are more willing to express their opinions on Weibo,which makes the influence of Weibo not to be underestimated.Therefore,this paper takes this entry point as the research direction and studies text analysis based on Weibo data.However,due to the particularity of Weibo data,the traditional method can not achieve good results.The traditional TFIDF algorithm is relatively simple and very fast,but its drawbacks are also obvious,and the calculation of weights is not so satisfactory.In particular,there are not small defects in the calculation of keyword weights for short texts.Therefore,using the traditional TFIDF algorithm to analyze the microblog data can not get good results.In order to improve the performance of text analysis,the traditional TFIDF algorithm performs poorly in short texts and the algorithm has a linear decline when the data exists between classes and the intra-class distribution is uneven.This paper improves the TFIDF algorithm in keyword extraction.The main methods are:(1)Correct the weights and actual deviations calculated by the traditional TFIDF algorithm by training the suppression text and the gain text to obtain new weight values.(2)We introduce the improved algorithm into the big data platform,so that the algorithm can perform text analysis of massive data.(3)Applying the algorithm to the hot topic discovery direction of text analysis,and intuitively showing the hot topic mining effect.On this basis,we designed and implemented a text analysis based on Weibo data,and compared it with other algorithms.The experimental results show that the efficiency of the improved method works well. |