Font Size: a A A

Product Comment Data Tagging Based On Hierarchical AP Clustering

Posted on:2018-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:C X TangFull Text:PDF
GTID:2359330518478772Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Considerable research and findings show that online reviews are the most important factors for consumers to make purchase decisions under the virtual environment of online shopping.Meanwhile,online reviews as feedback data can also help companies to improve products and understand user needs.Nevertheless,useful review data is difficult to obtain for the reason that the review amount is growing rapidly and the data itself is non normative and redundancy.Therefore,an efficient and accurate technique to extract the effective information from online reviews is urgently needed.In order to solve the problem of data non normative,this paper use the feature information extraction technology to extract the online reviews into a unified template.In order to solve the problem of data redundancy,this paper construct a word clustering model to achieve the goal of filtering noise information and summarizing the characteristics of useful information.The aim is to provide a convenient and intuitive tool for enterprises and consumers to obtain useful online review data.This paper has two main processing logic,one is feature information extraction,the other one is word clustering and labeling based on the feature information.For the feature information extraction,this paper firstly defined the connotation of feature information which is <attribute,evaluate>,this template is used as the extraction format in the subsequent processing logic.Feature information extraction model is composed of two core modules,one is attribute value extraction model,the other one is POS and dependency syntactic template extraction model.In the attribute value extraction model,part of speech and implicit semantic are the main considering features.The filtering and weight assignment of part of speech is accomplished by calculating the word frequency statistics and artificial experience data.Thecalculation of implicit semantic features is accomplished by word cloud and seed dictionary.The weight distribution between part of speech and implicit semantic features is adjusted according to the matching rate with default template under different weight assignment.In the POS and dependency syntactic template extraction model,core processing tool is LTP semantic analyzer,input parameters are the attribute values obtained by attribute value extraction models,core processing logic is to filter out all the parts of speech of words which have a primary semantic relation with attribute values and dependency grammar between them.Finally,this paper complete the algorithm of feature information extraction based on the above construction of feature information extraction model.For the word clustering and labeling based on the feature information.On the basis of analyzing the applicability and advantages and disadvantages of the typical clustering algorithm,this paper proposed a clustering model based on hierarchical AP algorithm.The first layer of this clustering model is k-means clustering algorithm,while the second layer is AP clustering algorithm.The final processing logic is the backtracking and labeling of clustering results.In this paper,the data source of training and evaluation corpus is the online reviews in “Shop One”.All the models and algorithms proposed in the feature information extraction module are implemented and tested.The clustering model based on hierarchical AP algorithm is implemented.Comparative experiments with typical word clustering model are carried out under different test data set.The evaluation indicators are the accuracy,recall,and F value,which are the international evaluation index.The final evaluation results show that the clustering model based on hierarchical AP algorithm has the advantage in each evaluation indicators,furthermore,it has high stability in the case of increased data volume.
Keywords/Search Tags:review data, word vector, Grammatical dependency, feature information extraction, clustering, labeling
PDF Full Text Request
Related items