| Along with the development of the Internet technology and applications, Internet has evolved into a content composing platform for people from the original "information bulletin". Especially the newly-developing social media, such as Weibo, gather abundant of user generated content, in which contain plentiful amounts of sentiment information. Sentiment analysis based on Weibo is one of the study hotspots with important research value.The classification of words for its emotional tendency is the research foundation in sentiment analysis. New words in Weibo play a more significant role in the emotional expression of users, comparing with other words. Hence, the study of this paper focused on the sentimental polarity identification of the new words in Weibo.This paper studies the following aspects:1. Design and implement a sentimental polarity identification system for new words in Weibo. The system can select suitable candidate words from the corpus, and determine the sentimental polarity of candidate words using the phrase templates method and syntactical amended morpheme point-wise mutual information method (SAM-PMI). The advantage of this system is that no labeled corpus is needed, and it takes both the statistical characteristics and syntactic rules into consideration, and it’s applicable to any new words.2. Phrase templates method and SAM-PMI method are proposed in this paper. Statistical information and syntactical information are taken into account in the phrase templates method, overcoming the semantic information loss problem caused by just using context statistical method. SAM-PMI solves the data sparseness problem by making use of morpheme. Utilizing modifiers, negatives and phrase templates lessens the errors that caused by pure word co-occurrence information and the degree words are used to quantify the effect of sentimental morpheme.3. Word polarity identification experiments are taken on10million Weibo messages. The result shows that phrase templates method achieved F-score with0.61, comparing to0.52of the baseline. SAM-PMI obtained the precision of0.832while morpheme PMI method and word PMI achieved0.772and0.723respectively. |