| With the rapid development and popularization of the Internet, the network has become an indispensable part of people’s lives. People read books via the Internet, video, learn, discuss, often leave a page on the.topic comment on the relevant page when shopping. In particular, consumers in the electricity supplier after the online shopping site will leave information on the purchase of product reviews. Now, with comments on the information network surge, effective utilization of comment data has been received more and more attention. Faced with a large number of reviews of data, how to dig into useful information for site users, service providers, manufacturers are of great significance. However, information on the comment page belong to unstructured data, which is a review on behalf of the data itself does not have a predefined data model. Due to irregularities and ambiguity comment data with respect to such data are stored in the database field is difficult to analyze using conventional procedures, statistical or summarized. How to extract data from the mass of comments useful information to facilitate the users, service providers and manufacturers effectively and quickly take advantage of this paper is to explore the issue.The main task of this paper reviews data on the emotion classification and value-based classification. For comments sentiment classification, text is proposed based on emotion and word extraction point cross unsupervised learning information. Among them, the sentences word, speech recognition and extracts emotional word, when introduced to reduce the classification of noise. By calculating the emotional word and example words PMI sentiment worthy of a comment document. For users, comments, data and sentiment can not meet the needs of the reader comments, and therefore proposed to introduce the feature point maximum entropy method is based on the value of data classification reviews. This method considers a review of data and valuable information in addition to the length of the text, if there are descriptive words and other characteristics related to further comment on whether the text associated with the point topic. Proposed text pointing IP-based tree identification method, identification features incorporated into the model as a result of the training, proved that the method constructs a classifier has better classification results. |