| With the rapid development of Web 2.0, more and more netizens are accustomed to publishing opinions on network carriers like BBS, Blog, etc. The discrete texts with scattered storage and different views constitute an all-encompassing public web opinion. The qualitative and quantitative analysis for sentiment polarity in discrete texts is an important way to know public network opinions and netizens'attitude towards things or events. On that basis, the clustering analysis on time-varying public web opinions and the visualization of results can represent the tendency of public opinions vividly. That is a hot issue with common concern in many fields.In summary, the thesis accomplishes the goal of public opinion analysis with sentiment polarity as the clue and opinion mining as the strategy, according to clustering analysis.The study on opinion mining of Chinese texts starts late, and much fundamental work is still in progress. The research of analysis on public opinion in network discrete text is just on the initial stage. The thesis focuses on characteristics of discrete text to clustering analysis on public opinions.The thesis studies on the titles and snippets in blog texts. Blog texts imply rich sentiment with scattering distributed polarity. Therefore, it's difficult to obtain the key semantics or centralized concepts in blog texts. However, titles and snippets contain relatively less sentiment words and express concentrated concept. Thus, selecting titles and snippets of blog texts as the ultimate research object is an important measure to accelerate clustering convergence.The experiment in thesis is consists of clustering analysis on blog text public opinion and the evaluation for clustering results. The clustering analysis for blog text public opinion comprises two parts, one as clustering analysis model based on concept of public opinion, the other as visualization of clustering results. The thesis improves traditional vector space model (VSM) with introducing the concept of words and uses concept-based VSM to represent blog texts (titles and snippets) to upgrade the precision of text representation. Blog texts are respectively represented by term-based VSM and concept-based VSM with clustering analysis using k-means algorithm. Finally, the clustering result is visualized and evaluated. The traditional VSM is a comparison group to evaluate the performance of concept-based clustering analysis on public opinions. The evaluation model of clustering results is Ground Truth with three common metrics, which are Precision, Entropy and Rand Index.The experiments show that concept-based VSM has better performance than traditional term-based VSM in public opinion clustering of the discrete texts. |