| With the development of Web2.0, Internet users become information publishers from readers. Especially after the e-commerce, social networking appearing, Internet users has become the largest source of information throughout the Internet. These user-generated data are very important to Internet users, product manufacturers, service providers and even the government departments. We can analysis and process these user-generated data to understand the behavior of these Internet users, the current hot events and so on. But such a large amount of data has been far beyond which human can process, in this case the computer has become the best tools. Thus the opinion mining which is a natural language processing task was born.Opinion mining aims to extract an opinion summary from opinion text. In order to give users a more intuitive and more comprehensive understanding of the overall situation. And it is a basis for decision making. In this paper, the task of digging required for fine-grained views on the opinion information extraction and classification problems were studied.This paper builds a subjective corpus based on online comments for research. On the basis of subjective corpus, the types and characteristics of opinion information was analyzed to provide theoretical support for opinion information extraction.In this paper, CRF is used to extract opinion information with the word clustering information which is obtained through unsupervised learning model. The experimental result shows that the performance of the model can increase nearly 10% of the baseline with short-dependency.After extracting the opinion information, we also need to classify the opinion information to reduce errors that caused by different forms of the same theme. The method used in this paper is cluster analysis based on semantic similarity calculation and seed sets. The experimental results show that the performance improvement can be about 6% with the baseline. |