| As the result of the development of social media,the number of microblog users increases,and there are about 150 million users using microblog on a daily bases.Users express their opinions,emotions with microblog.Enormous data generated on the platform are of high commercial value,which attracts many researchers from all over the world to research on sentiment analysis of microblogs.Meanwhile,only a rather small proportion of these data is labelled by linguistics,which are not sufficient for training some machine learning algorithms,especially for deep learning approaches.Researchers proposed a method to jointly train a language model(Skip-Gram,CBOW)on the plain microblogs to acquire more semantics with the classifier,which improves the performance of sentiment analysis.Based on their work,this thesis proposes a new model based on CRF and RNN,and the main contents of this dissertation are as follows:(1)Employ an algorithm to add a special tag on verbs,adjectives and adverbs behind negative words,thus the model could get a better understanding of the emotion expressed by these words.(2)Given that the model tends to predict sentiment classes with higher frequencies,this thesis applies an algorithm based on the similarity between microblogs to balance datasets and to alleviate this problem.By merging two microblogs of the most frequent sentiment class with highest similarity into one,datasets are balanced without information loss.(3)Proposing a new model based on BLSTM and CRF.This model is jointly trained as a language model and as a sequence tagger.Semantic knowledge gained from the language model promotes the quality of sentiment analysis.Experiments in this thesis including evaluations of NLPCC 2013,CCIR 2014 and the other datasets have proven the validity and robustness of this work. |