| BBS is a kind of electronic information service system of which content is rich. The interaction between the users of BBS is strong and the Transmission speed of its information is very fast. As on to Forum users, their professional level is higher, and their purpose is strong. In order to monitor, manager and control the BBS effectively and also to give us a safe, healthy and civilized Internet environment, the analysis of BBS public opinion is becoming more and more important. Especially for college forum, it has more far-reaching significance. This study is based on a project which named public opinion of university. This project is taken over by Pattern Recognition Laboratory, Beijing University of Posts and Telecommunications. In this project, My main work and innovation are as follows:1. Research on user action analysis and document representation model. By reading some the frontier thesis, sums up currently some common main method for user action analysis. And by comparing the commonly models of document representation, discover their respective advantages and disadvantages.2. User action analysis. Through real-time collect, statistic and analyze the information on forum, the feature of user behavior can be get, such as the number of published or replied posts, etc, and then active users and opinion leaders of the BBS can also be detected; Abnormal users can be detected by searching and filtering the content of their post; surfing habit of a single user and the whole group can be get by calculating the number of published posts of individual user or all users in different time periods.3. Document representation based on words correlation matrix is presented in this paper. Document representation is the premise and basis of semantic analysis, text clustering and classification, having great significance in the fields of information retrieval and data mining. Currently most methods of document representation ignores the inner relation between words for it assumes that words constituting the text are independent of each other. Therefore, document representation based on words correlation matrix is presented in this paper. And the frequency threshold and words correlation threshold are set to avoid introducing noise caused by calculating correlation between irrelevant words. The experimental results of K-means clustering show that such document representation method can be more accurate in describing the document characteristics and improve the quality of clustering. |