| Microblogging, namely micro-blog, is an information-sharing, disseminating and accessing platform based on user relationship. Users can login though WEB, WAP and other client, and also can update and share information with the longest 140 words(including punctuation) of the text information. Microblogging as an important social networking platform, gradually affects people’s life and work with features of convenience, innovation, originality and interactive. In recent years, there’s more and more research on microblogging including the topic event analysis, sentiment analysis,information retrieval and recommendation, networking analysis, information dissemination, impact analysis, etc. The accurate advertising which is provided by the interest model of the micro-blog users has grown into a main profit model. The research of the interest model method of the microblogging users has a significant role for micro-blogging site to improve customer satisfaction, profitability and development.In 2012 Sun Wei "Interest Mining and Modeling for Micro-bloggers of Micro-blog" and in Jan. 2013 Chou Jun "Research on users interest modeling based on microblog social network" have both put forward their own way about users interest modeling based on the microblogging, but there are different focus. The former focuses on the selection of microblog users’ interested information, while the latter focuses on the constitute of microblog social network. The starting point of this article is different from the above two and focuses on the approach of micro-blog text preprocessing.The contents of this paper consists of two parts: the pre-processing of the microblogging text and the microblogging users’ interest in modeling. The main contents and results achieved are as follows:①Stop words filtering method research: Stop words filtering is an essential part in Chinese text processing,and the accuracy of stop words filtering will directly affects the follow-up study effects of text analysis, content extraction and correlation. According to the characteristics of stop words, this paper propose stop words’ definition based on the context. And by analyzing the speech features of stop words and the characteristics of Chinese text in the microblog, this paper propose a stop words filtering rule of microblog text preprocessing, and it can effectively remove stop words in the microblog text. This approach is quick, simple and effective,and provide effective support for follow-up studies based on microblog text about user interest modeling.② New words discovery method research: microblogging is the rapid origin of some new words and the network vocabulary. The research shows that 60% of the participle errors are caused by new words, and the participle errors will lead to subsequent inaccuracy of user interest modeling. So the research on new words discovery method for microblogging text is an effective way to increase the users’ interest. According to the new words constitution, this paper propose microblogging new words definition based on adjacent phrases, and according to the characteristics of the micro-blog text, this paper identify special text microblogging with two special symbols "@" and "#", and perfect the candidate sets of new words by using the multivariate expand method.③Microblogging users’ method of interesting expression research: The micro-blog text information, including personalized labels, original microblogging, forward microblogging, commenting microblogging, constitute the initial user interest information. Because the microblogging text belongs to the category of short text, there are problems of data sparseness and fragmentation. In order to overcome the effects of the problem, this paper put forward the concept of combining vector space model. And also from the perspective of short text expansion, this paper use the building concept glossary of "synonyms Cilin" to concept mapping in order to extend the vector. And this paper responds well to the users’ interest in the fine-grained by constituting microblogging users’ interest model. |