Font Size: a A A

Study On Quality Analysis Method Of Text Comments

Posted on:2017-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:W L GuoFull Text:PDF
GTID:2348330509954000Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of e-commerce, more and more people tend to online shopping. On the one hand, as a result of information asymmetry for both parties in a trade, it makes difficult for users to identify the quality of commodities online, and reduces the efficiency of market transaction in e-commerce. On the other hand, the amount of user comments is too huge and increases year by year. In addition, the consumption experience and evaluation attitude differ greatly between users, which leads to great differences in the information and value of comments. Therefore, how to get valuable information quickly from massive comments is an urgent and important task for the e-commerce market at present. From the perspective of the comment’s quality in this paper, we construct a hierarchical structure of commodities’ features on the basis of the relationship of generalization/specialization between commodities’ features. With reference to the hierarchical structure, we analysis the comment’s quality and then recommend high-quality comments to users.In this paper, we make the comments evaluating a commodity as a document and extract the commodities’ features which users concern about and often use in the comments. According to the relationship of generalization/specialization between commodities’ features, we put forward a topic hierarchy lattice algorithm based on FCA(THL Based on FCA, TBF) to construct the commodities’ feature topic hierarchy lattice(Topic Hierarchy Lattice, THL) after we use grammar parsing to extract features. We adopt latent dirichlet allocation(LDA) to analysis the topics in documents and extract the feature topics which are consist of commodities’ features expressed as a certain possibility distribution. We get the probability matrix of doc – topic and topic – word from the analysis result. With reference to the binary relation between the commodities’ categories and feature topics, we use formal concept analysis(FCA) to build the THL.On the basis of THL, we propose five factors effecting the comment’s quality in this paper, which are comprehensiveness, specificity, cohesion, relativity and readability. Then we design a model recommending commodities’ comments(Comment Quality Model Based on THL, CQM) to calculate the quality score for each comment. CQM combines those five effective factors of comment’s quality and it can evaluate the comment’s quality comprehensively.We select the transaction data in JD in 2012 as experimental dataset, including 116 commodities’ categories, 6,212 commodities and 18,415,146 comments in this paper. And we collect the score data of comment’s quality as the test dataset by manual scoring. In the experimental result, the MAE value of CQM is 0.726. It shows quality scores of comments predicted by CQM are close to scores by manual scoring, which verify the precision of CQM predicting on comment’s quality. Then we compare the results of CQM and four common classifiers, where the random forest performs the best in the classification experiment, and the highest recall is 56.6%. That shows those effective factors are reasonable. At the same time, the classification result is also better, which shows that the CQM proposed in this paper is effective.
Keywords/Search Tags:comment’s quality, feature extraction, feature topic hierarchy lattice, LDA, FCA
PDF Full Text Request
Related items