Font Size: a A A

Sentiment Analysis Of Chinese Travel Reviews

Posted on:2017-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y PeiFull Text:PDF
GTID:2308330485960343Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of the global tourism and improvements on people’s living standard, travelling is gradually becoming a lifestyle for people to spend their leisure time. However, tourists often face one problem:how to choose among various attractions and hotels in destination city? Some well-known domestic and foreign travelling websites have provided vast quantities of information so that tourists can make their decision. Therefore, it is of great importance to conduct sentimental analysis on these online reviews, both theoretically and practically.Existing works that aimed at domestic sentimental analysis remain in superficial level. These works left some problems unsolved, including ignoring the complexity and diversity of Chinese language, and the imbalance of comment datasets. This paper aims to solve these problems by conducting sentimental analysis on online reviews. Our datasets are captured from Ctrip.com, the leading travelling website in China. Since comment datasets are somehow imbalanced, in order to eliminate the impact of these factors, this paper focuses on both balanced and imbalanced sets.For balanced datasets, this paper proposes two improved algorithms on feature extraction:the first is a feature extraction algorithm based on keywords and sentimental words. The second is a feature-improving algorithm based on Chinese sentence structure. Moreover, this paper gives a SVM categorization model and verifies the correctness of these two algorithms. Results show that improved algorithms can extract the features of attributes on travelling dimensions and sentimental keywords. Our model also can reduce the dimension of features and can effectively identify emotional tendencies in very complex comments.For imbalanced datasets, this paper uses over-sampling algorithms for synthetizing negative samples to reduce the imbalance of datasets. This paper also discusses the limitations of SMOTE and BSMOTE algorithm. Known limitations include ignoring the influences of the isolated points or the unreasonable over-sampling rate leading to decrease in classification performance. To eliminate these limitations, we design the MSMOTE algorithm and compare the performance of these three algorithms. Experiments show that MSMOTE improves the classification performance of negative samples effectively.This paper gives a sentiment classification model which can be applied in the field of tourism, reduces the impact which is caused by the imbalance of datasets, and improves the accuracy of classification on unknown comments. Our method can quickly help tourists identify emotional polarity of the travel comments and provide theoretical basis for analyzing satisfaction of destination.
Keywords/Search Tags:Sentiment analysis, feature extraction, SVM categorization, imbalanced dataset, over-sampling algorithm
PDF Full Text Request
Related items