| Along with the development of Internet, the Internet users increase rapidly. More andmore people would like to express their feelings and thoughts through network. Internet isbecoming a center of information distribution, even public feelings. Under the background ofthis reality, some unreal talking and malevolent exaggeration on sensitive cases andemergencies could cheat and mislead people, and it is necessary to effectively supervise thetopics and expressions on the internet. Grasping the public opinion trends through themagnanimous network text data is bound to the accurate highly effective analysisreorganization of texts that contains different information. Therefore, analyzing publicopinion is of much significance to steadying social order and promoting nation development,the study of gaining and analyzing technology of internet public opinion has been an urgentand import issue.The thesis introduces and analyzes information acquisition, information preprocessingand public opinion analyzing.and study the key technologies involved in public opinionanalyzing system, including the feature selection problems and the feature weight algorithmsproblems:(1) This thesis focuses on the feature selection problems. Information Gain algorithmfor text feature selection usually leads to some features which are low-frequency in thedesignated category but high-frequency in other categories to be selected; this is clearly notthe desired results for feature selection. To overcome the shortage, this paper proposes animproved IG approach based on Compensation Factor and Penalty Factor for featuredistribution. An experiment is carried out and the results show that the improved method caneffectively balance the information content for feature appearing or not, and achieve the betterclassification results.(2) This thesis focuses on the feature weight algorithms problems. Term weightalgorithm has great impact on the classification results; Traditional algorithms don’t considerdistribution information among and inside classes. This paper introduces a new improvementidea of Skew Information Among classes, Distribution Information Inside a Class and WeightAdjustment Factor, then puts forward a new term weight algorithm based on WA-DI-SI afterin-depth analysis of improvements method, and uses SVM to check its validation, the methodis better than others and proves that the improved algorithm is feasible. Using the above research results, the thesis designs and implements public opinionsystem of network. |