| Online social network has become a major platform for people to communicate and exchange information. Microblog plays an important role in the social network and has become one of the important media. As the largest microblog platform,Sina microblog users online behavior analysis and prediction, directly supporting the public opinion-oriented, enterprise microblog marketing activities. So research microblog user behavior for corporate and government decision-making provides an important reference. So in this paper completed the works as follows:First, the analysis of the theory and methods of Chinese text feature extraction, including segmentation problems, the key words extraction, weight setting, DF, MI, CHI, TFIDF and imformation gain method. Then, the insufficiency of text categorization algorithm model is studied, including KNN algorithm, class centre vector algorithm, bayesian algorithm and logistic regression algorithm.Secondly, from the perspective of the overall training set microblog conducted a statistical analysis, which is no user behavior and microblog scale factor with user behavior and number of microblog with full user behavior. Microblog user behavior and the number of microblog is accorded with power-law distribution. User actions are conform to the rule of “ likecount > commentcount > forwardcount”. Likecount, commentcount and forwardcount have a higher clustering coefficient and smaller average distance. Each user behavior for each uer there is a central point. I extracted the key word of different user actions by fuzzy sets and information gain algorithm. By cluster analysis of time that each microblog was created, I get the relation between user actions and time of microblog was created.Finally, the center vectors algorithm and fuzzy sets combine to form a new cluster center vector algorithm. Because unncertainty of traditional KNN algorithm K value, category collection instead of method of required distance of KNN has been improved. Which was combined with the new cluster center vector algorithm realized the prediction of user behavior. |