| Microblog, as one of the currently most mainstream social networking platform,has become a significant way for users to propagate and retrieve information. Microblogtopic model can help users find interesting posts and similar users from massiveamounts of information. However, it seems that the traditional topic model doesn’t workwell in finding really interesting posts and similar users, because the content ofmicroblog platform is short message and updates quickly, also with huge data volume.In this paper, we research the existing topic model methods and propose animproved topic model named MBUT-LDA in according to microblog user and timedimension. This method has the following characteristics:⑴Through the research of Author-topic model and Multi-grain Topic Models,this paper clusters user microblog news according to time distribution. This methodsolves the drawback of microblog news incomplete caused by short microblog message.Furthermore, MBUT-LDA can make the topic more accuracy based on the timedimension.⑵On the basis of analyzing the relationships between microblog users and theirfriends, we propose the concept of "attention". Combined the formula of "attention"and "TF-IDF",we propose a new formula,"ATF-IDF". The "ATF-IDF" can be used tomeasure the capacity of the prediction on topic made by word.⑶With the development of mobile internet, the number of microblog users isgrowing dramaticly which results in the accumulation of data. When faced with largedatasets, traditional techniques on single node become less practical within limitedresources. So we present distributed MBUT-LDA based on Hadoop in order to processlarge scale microblogs.In this paper, we use sina microblog data to evaluate the distributed MBUT-LDA.The experiment shows that the optimized MBUT-LDA can achieve a better Perplexityand accuracy than MBUT-LDA and U-LDA(LDA based on user). Furthermore,distributed MBUT-LDA can reduce time-consuming as the number of nodes increasesand handle large data effectively. |