Font Size: a A A

The Research And Implementation Of Opinion Target Phrase Extraction In Sentiment Analysis Domain

Posted on:2018-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:L Q WangFull Text:PDF
GTID:2348330536952508Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile Internet in recent years,Micro-blog as a emerging social network is growing rapidly,then massive text data is generated everyday.Micro-blog,as a major carrier of mobile social networking,is rich in content,and it has a high value of data.It provides a valuable reference for government public opinion monitoring,enterprise advertising,user behavior prediction and information decision-making by identifying target phrase and analysing sentiment.Micro-blog sentiment analysis mainly includes two factors: identifying the target phrase of the text and sentiment orientation analysis.As the topics of Micro-blog always involve many different aspects,identifying the target phrase of the text has been a very important and also difficult task in sentiment analysis domain.Related works shows that unknown words recognition is one of the important factors which influences the performance of the target phrase recognition algorithms.Therefore,the method of extracting target phrase based on the unknown words recognition in Microblog has been a very important and meaningful work.Eigenvectors of unknown words recognition model is designed to improve the recognition rate in aspects of feature extraction,classifier selection and feature template selection.Then the algorithm is applied to the evaluation of target phrase recognition,and the experimental results are verified by the actual data of Micro-blog.The main work of this paper is as follows:1.Firstly,the statistical feature based on the sequence of text words,cohesion and degree of freedom is proposed,which is used as the recognition feature of unknown words;The method of Naive Bayes,SVM,Artificial neural network,Logistic regression and Decision tree has been used to identify unknown words,as comparing with others,artificial neural network algorithm with better recognition effect is chosen as the decision model.2.The paper then introduces the three symbols of B,I and O,then conditional random fields CRFs was used to translate the phrase recognition problem into a sequence labeling problem.The appropriate feature template and unknown words trained by artificial neural network are applied to the experiment of target phrase recognition.3.Micro-blog of Sina was selected as the data source of this paper,after manual annotation,target phrase recognition experiment is carried out.Experimental results show that adding the newer phrases recognized from Micro-blog,the performance of the CRFs based target phrase extraction algorithm improved,in both precision and recall.Without extra manual consumption.The unknown words recognition model is trained with the same labelled data set as CRFs which increases the feasibility of the algorithm.
Keywords/Search Tags:Micro-blog, target phrase, unknown words, statistical feature, CRFs
PDF Full Text Request
Related items