| With the rapid expansion of information technology, Internet becomes the main platform of information releasing, exchanging and acquiring with a huge number of users. Internet public opinion, having vital influence to society, is the aggregate of individual attitudes or beliefs, which come from Internet, held by the adult population in some area in a period. Public opionion mining on the Internet become more and more important.As a method to collect public opinion, public opinion mining on the Internet becomes the researching focus. The research is of great realistic significance to deal with log data, which is of mass volume, sensitive of timeliness and accuracy.This thesis focus on Internet public opinion mining techniques.After clarifying the notion of public opinion and relative concepts, this paper mainly accuqire knowlodge from log data of search engines, and focue on the search engines logs for text mining.Source data has two attributes, which is short text and mass in quantity, this paper apply natural language processing technology and parallel computing. Through text classification technology, partition mass data into coarse particle classes. Through this unsupervised clustering method to detect hot event in public opinion. In order to adapt to parallel computing, this paper apply a new algorithm, Pail Clustering. In Chinese word segmentation algorithm, by adding Term weighting methods, this papaer improves the text similarity, which finallycan increase the accuracy of event detection.Based on events detection, through a distributed platform, this paper achive statistic of log data.By building a data warehouse,this paper implement the multi-dimensional analysis of log data, trend analysis of events,hot-word recommendation and visualization.In research and application of the above problems, this paper design and implement a search engine log analysis system. This system integrates a pail-clustering algorithm and term weight calculation algorithm for practical applications, to achieve the statistical analysis of log data, event analysis and visualization.It can provide users with more effective results for analysis of log data... |