| As the Internet technology develops and alternates with each passing day,more and more online social platforms are now taking a vital part in common people’s daily life.The development of the internet brings people convenience,broadens their vision,and also,construct up a totally new social space and social model.Nowadays,netizens participates in interaction and spread their opinions online via social platforms like Sina microblog,Tencen wechat.To hot events,Sina microblog has become an important front of people’s online social life.On the process of network activities,the textual carriers like the posted microblog,the replied content and the reposted comments are actually the concrete representation of emotion.To analyze the generated text will be of great help for the government to acquire and control the current common sense and then make corresponding policy,also,enterprise can adjust their operation strategy by the analysis.This paper is to explore the sentiment analysis of the Sina microblog hot event.Traditionally,Dictionary based measure or Machine learning based measure is generally applied in sentiment analysis,which results in lower accuracy rate obviously.This paper improves these two ways respectively,and then combine them to analyze the sentiment of microblog hot events,the relevant work is below:1)way to acquire data.Sina website offers an API interface at their developer platform,but considering the low speed of data acquiring and the data quantity limit through the API,web crawlers are applied as an extra way to get data in this paper.2)sentiment analysis based on sentiment dictionary.Current generally used sentiment dictionaries are selected and combined as basic sentiment dictionary in this paper.Considering the phenomenon that some initial words now appears with a new meaning,some representative net words samples are by statistical to amend the sentiment orientation of words in this paper.Among net words,some words with no syntactic structure often contain important emotion message,so typical net new words are collated to assist the sentiment orientation.Meanwhile,EMOJI dictionary is used to classifier the text with EMOJI.In order to expand the basic sentiment dictionary,the Skim-gram model of Word2vec is applied to represent the single word in vector way so that the similarity of words could be computed,at the same time,SO-PMI algorithm is used to compute the pointwise mutual information between the candidate sentiment word and the criterion sentiment words.Then the sentiment dictionary can be expanded.3)sentiment analysis based on machine-learning.FastText model with basic classifier is used to sentiment orientation in this paper.FastText mode not only has the function of word representation,but also remains the location information of the single word in a text,which imports a location weight of sentiment word and improves the accuracy rate.Additionally,based on Knn algorithm,IDBKnn classifier algorithm which considers inner K average density and class-center distance as weight is proposed in this paper.Some other classification algorithms,comparing with the IDBKnn,are implemented on IRIS dataset to verify the excellent classification performance of the IDBKnn.3)new way of sentiment analysis.Firstly,the expanded dictionaries are used in sentiment orientation of the short text documents so that the texts with distinct sentiment orientation are selected as train data.these data is to train a classifier,which will finally classify the ambiguous texts.Through those improvements above,relevant contrast experiment confirms that the sentiment analysis model with a combination of dictionary-based way and ML-based way has a higher accuracy rate comparing to that in a single way. |