| With the rapid rise of social networks and increasing scale of Internet users, emerging me-dia, represented by the Internet has become an indispensable tool for the public to express aspi-rations, criticize the current problems, make recommendations, and communicate effectively, aswell as an important channel for the mass to exercise their rights to know, to participate, to ex-press and to supervise. Thus, the users have turned into the producers of information from therecipients of information, contributing to accumulation of information resulted from a largenumber of users on the network. User-generated information contains much information such asemotional attitude, political tendency, etc. Mining emotional information carried by us-er-generated content information, analyzing users’ emotional tendencies, is of great significanceto product recommendation, public opinion discovery and information prediction.So far, a lot of researches have been made by researchers in the field of orientation analysis,promoting the progress of tendency analysis. Because users’ emotional information is mostlyembedded in user-generated text information, and natural language processing research itself is avery challenging task; in addition, users’ emotional information may change according to differ-ent contexts. These will result in several tendency analysis problems urgently to be solved in thefollowing:(1)Corpus distribution imbalance exists in the tendency analysis; corpus of some areas canbe easily available via the Internet, while corpus of certain areas is difficult to obtain. How tosolve the problem of unbalanced distribution of corpus, to make the built emotional vocabularybe with high ability of field displacement, to achieve the goal of interdisciplinary tendency anal-ysis is the primary problem which needs to be solved currently.(2)Emotional words are not only be with field dependence, but context dependence, causingthe same emotional word in different contexts to show different emotional tendencies, whichsignificantly reduces the system accuracy. How to deal with the context-dependent issues ofemotional words is the key to improve orientation analysis.(3) For sentences may contain negative words, comparative words, emotional words withdifferent tendencies, and other complex language phenomenon, whether a reasonable sentencetendency analysis model can be built, to capture various factors influencing the sentence orienta-tion, and realize the purpose of improving sentence tendency analysis is one of the problemsconfronting orientation analysis.(4) Plane topic models are difficult to describe the relationships between topics and proper-ties in the comment text, resulting in difficulties in fully grasping the global emotional tendencyof certain comment topic. Whether an appropriate comment text representation model can bebuilt, to describe the longitudinal hierarchy and lateral correlation in the comment text, andeventually achieve the goal of describing users’ final emotional tendency, is an important issuecurrently facing us.In response to above-mentioned problems, this paper established the research content, and ultimately made a breakthrough in the following several aspects. Major work is as follows:(1)Research on the problem of automatic extension of emotional words in various areas,and dealing with distribution imbalance of data in different fields. Aiming at the problem of un-balanced corpus in orientation analysis, this paper proposed a method of sentiment analysis forcross-domain. In this method, we analyzed the emotional tendency of the unknown words in thetarget field in use of the labeled information in the source field.This method firstly divided emotional words into two categories: dependent emotionalwords and independent, based on which two assumptions of the original orientation analysiswould be extended, the relationship between the source field and target field be constructed toachieve the goal of emotional words extension. The whole method involved emotional wordsextraction and emotional words orientation definition two steps. The phase of emotional wordsextraction adopted a method combining part-of-speech information and improved mutual infor-mation to calculate the dependence intensity between candidate emotional words and evaluationobjects, and obtain the emotional word set of the target field.For the purpose of orientation definition, the relationships between words and words, wordsand evaluation objects, words and documents were constructed, using which the emotional ten-dency of each emotional word could be calculated, ultimately achieving the goal of interdiscip-linary emotional words extension.(2) Research on orientation analysis of evaluation phrases. An evaluation phrases tendencyanalysis method basing on emotional expectations of evaluation objects was put forward. In viewof the problem of emotional context dependence, first of all, the context of emotional wordswould be decomposed into evaluation objects, the potential emotion of which was used to quan-tize the impact of evaluation objects on phrases tendency. On this basis, the relationships be-tween evaluation objects, emotional words, evaluation phrases could be constructed. Finally, theobjective function of phrase orientation analysis would be constructed based on heuristic rules,to achieve the goal of phrase orientation analysis. Experiments showed that, combining with theemotional expectations of evaluation objects, tendency recognition of evaluation phrases hadbeen effectively improved.(3)Research on the problem of negative sentences orientation analysis. For the negativephenomena that exist in the sentence tendency analysis, this article analyzed the main factorsinfluencing the negative sentences orientation analysis and the negative scope of negative words,on this basis, put forward a kind of negative sentences tendency analysis method based on cas-caded HMM. The method was divided into three levels, of which HMM HMM-1and HMM-2were applied to identify evaluation objects contained in the negative sentences, and define thepotential emotional tendency of every evaluation object. Then negative words contained in thesentences would be put as the trigger condition to correct the emotional tendency of evaluationphrases; finally, global tendency of the sentence be computed according to sentence rules. Thismethod attended Task1of2012the fourth national orientation information measurement, whichwas exactly Chinese negative sentences orientation analysis, and obtained optimal evaluationresults in all submitted results. (4)Research on the problem of comment text model construction, in order to fully capturethe emotional tendencies of network users on a particular topic or product, to solve the defectsthat it is difficult to capture the global information in simple use of evaluation attributes. Thispaper built a model for correlation detection of comment text. In this model, comment text wasseen as a hierarchy. First of all, the comment text would be divided into several individual se-mantic units; the semantic units further be divided into two parts: subject attribute and semanticunit attribute. Among them, the subject property was used for global correlation of the same top-ic or product, and the semantic unit attribute was used to distinguish the relationships betweenthe topics or child attributes. For the division of semantic units, in this paper, the traditional In-formation Bottleneck Method (referred to as IB) was expanded based on comment text feature,and used to divide semantic units; in the correlation detection of related topics/products, the me-thod of weighted KL for correlation detection was adopted. In order to verify the feasibility ofthis thought, this paper respectively conducted tests on TDT4data sets, and the results showedthat the model built in this paper could capture the correlation relationship between the sametopics/products more accurately. |