| With the booming development of e-commerce,online shopping is so convenient that has become a new popular way of shopping.A variety of shopping platforms come one after another,with online shopping springing up.In the meanwhile,product reviews have been taken as a significant reference for consumers before making consumption decisions.Nevertheless,the more heavily consumers rely on product reviews,the more fake reviews fill the entire online shopping environment.By this time,it is particularly important that consumers need a powerful tool that can help them identify fake reviews quickly and efficiently.The research of this paper is carried out under this background.The following are the main contents and innovations of this paper:First of all,a Chinese corpus for e-commerce mobile phone reviews has been constructed.So less is Chinese corpus can be used in the mining of the e-commerce fake reviews mining,that there exist no relatively complete Chinese e-commerce corpus as experimental support.Hence this paper obtained 10000 mobile phone review data using the web crawler technology,and manually categorized reviews after preprocessing them,the corpus is the basis of the experimental work in this paper.Secondly,exploratory analysis of e-commerce review data was conducted to extract multi-modal characteristics of the data.In recent years,shopping festivals have become a craze on various shopping platforms.This study explored whether there is a correlation between the authenticity of reviews and the release time of reviews during shopping festivals,and had proposed a new basic feature--festival time window,but also a significant difference exists between the text length distribution of the two types of reviews.The consequence of word frequency analysis manifests that the two kinds of reviews have their own characteristics in emotional tendency and text expression.After exploratory analysis,many modal characteristics had been extracted,such as the festival time window,text length,emotional tendency,and degree of the brand mentioned.And then respectively using the chi-square test and spearman’s rank correlation coefficient to test the independence among the dependent various and variable characteristics,the characteristics which were through the independence test would be selected for e-commerce fake reviews mining model training.Thirdly,a mining model of e-commerce fake reviews was established based on the machine learning algorithm.Existing e-commerce fake reviews mining rarely studies the text content itself as a semantic feature.In this paper,reviews’ word vector was considered as a semantic feature inputting into the training model.Firstly,the advantages and disadvantages of TF-IDF,Word2Vec,and BERT were compared.What had been proved is that the Word2Vec model is most suitable for training text word vector in the experimental corpus of this paper rather than other methods.After that,the trained Word2Vec word vector was combined with text sentiment as a semantic feature,and then the basic feature and keyword feature are combined,Naive Bayes,Logistic Regression,Support Vector Machine,Random Forest,and AdaBoost were respectively used for training,an e-commerce fake reviews mining model with the best performance had been obtained finally. |