Font Size: a A A

Research On Fusion Multi-feature Hotel Reviews Classification Algorithm Based On Neural Networks

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhouFull Text:PDF
GTID:2518306779996079Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The popularity of the Internet has made online reviews a valuable information resource available to people.With the development of e-commerce,the amount of product review information has surged,and some of these reviews are deliberately fabricated or have no reference value.Traditional research on spam reviews detection is mostly based on the reviews text itself,which does not take the features of reviewers into consideration,resulting in low recognition accuracy.Therefore,this thesis proposes a spam reviews recognition method integrating global-local attention mechanism and combining multiple features based on neural network.In this thesis,Yelp hotel review data set is used for experiments,and the proposed model integrates review text features and reviewer features to identify spam reviews,so as to effectively classify spam reviews from real reviews.First of all,the text representation for the reviews text,due to the traditional way of word embedded polysemy problems cause unable to get accurately the semantic information of text,this thesis uses BERT pre-training language model,the training of the model includes the location of the text and sequence information,use of a bidirectional Transformer encoder to obtain text semantic characteristics,The representation matrix of the reviews text is obtained through training.Matrix,then step on to get to ignore the noise and unrelated words from the text,to get what words in the global scope is more informative and global features of capture text,use the global attention mechanism for words to give the corresponding weights,global attention mechanism when calculating the context vector each step,all need to consider the encoder position of state variable,Considering each hidden state of the encoder,the feature representation matrix of the text in global attention is obtained.Which words in order to get the local scope is more informative and use the local attention mechanism for words to give the corresponding weights,different from the global attention mechanism,a context window is needed here,when the word in the middle of the window location,consider only hide status before and after a certain range,the higher the concentration value represents the word has more information,In this way,the feature representation matrix of text in local attention is obtained.The two matrices were extracted with three convolution kernels of different sizes respectively,and then the maximum pooling strategy was used to reduce the matrix to obtain the most significant features in the text representation,thus obtaining two new matrices.For the reviewer feature,it is formed into a one-dimensional vector,and then normalized.After three fully connected layers,it is connected with the two feature representation matrices obtained in the previous step in the same dimension to form a new vector integrating multiple features.After three fully connected layers,The final fully connected layer uses the Sigmoid activation function to perform the final classification task.In this thesis,reviews and reviewers are integrated,and the influence of both on spam reviews recognition is considered comprehensively.In terms of text training,BERT pre-training language model is used to obtain more accurate text representation,and global-local attention mechanism is used to distinguish the importance of words.Compared with the traditional convolutional neural network model and some relatively new models on the Yelp hotel review data set,the garbage identification performance of the model in this thesis has been improved to a certain extent,with the accuracy,precision,recall and F1 value reaching 90.24%,90.54%,89.16% and 89.84% respectively.Ablation experiments have been conducted,and the experimental results are in line with expectations.The validity of the model is proved.
Keywords/Search Tags:Spam identification, Neural network, Attention mechanism, Classification
PDF Full Text Request
Related items