Font Size: a A A

Research On Net Literature Misleading Comment Filtering Technology

Posted on:2020-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:F HanFull Text:PDF
GTID:2415330590463147Subject:Engineering
Abstract/Summary:PDF Full Text Request
By the end of 2018,China's net literature readers have exceeded 400 million,readers publish massive literature reviews every day.A large number of low-quality reviews mingle with other reviews in the commentaries.Misleading comments,as a kind of difficult point in low-quality comments,affect readers' reading experience.How to effectively filter misleading reviews is an urgent problem to be solve d.Misleading reviews filtering is a typical problem of short text classification.Compared with traditional text classification,it is a proble m of typical short web texts classification.There are three difficulties in filtering.(1)Short text length and insufficient context information.(2)Colloquial expression makes the text is not standardized.(3)The sample distribution imbalance caused by the evaluation of emotional disequilibrium.In recent years,deep learning technology has achieved great success in many natural language processing fields,such as text classification and machine translation.With its advantages in extracting data features and powerful problem-fitting capabilities,it provides a new perspective for short text classification.The goal of this paper is to design an efficient filtering system of misleading reviews.Based on the characteristics of misleading reviews,compared with the traditional misleading reviews filtering system,we propose the following three improvements.(1)A vector representation method of Chinese words based on multiple contexts is proposed.Due to the irregularity of online novel review texts and the low coverage of general word vectors,the word vectors can only be trained by using the comment corpus under the same condition of word segmentation.However,due to insufficient context information in short texts and insufficient representation ability of embedded word vectors trained by short texts,a word vector training method based on multiple contexts is proposed.N-gram feature,Chinese character feature and scoring feature are introduced to improve the representation ability of word vectors through multiple context information.(2)NB-LR(Naive Bayesian-Logistic Regression)corpus expansion algorithm is proposed.Due to the unbalanced sample distribution of misleading comment samples,this paper proposes to adopt NB-LR corpus expansion algorithm to expand the model in order to reduce the time of screening positive samples.(3)A fusion network of reviews filtering algorithm based on scoring vector is proposed.It includes a score vector representation method and a converged network model that combined convolutional neural networks and attention models.Because misleading reviews are highly correlated wi th reviews' score.Incorporating score information into input word embedding matrix will improve text feature representation ability.Therefore,th e score2 vec score vector representation method is proposed to represent the score as a dense vector.On the other hand,The fusion network uses convolutional neural networks have superior capabilities in local feature capture,while using the attention model to capture long-distance dependencies,optimizing the weight distribution of global word vectors and score vector.Therefore,the fusion network improves the ability of filtering misleading reviews.Based on the above three improvement strategies,this paper designs and implements an offline batch misleading comment filtering system.In the training process,we implement word vector training model,corpus expansion model and depth model training model.In the predicting process,we implement prediction and filtering of new reviews and monitors the results.
Keywords/Search Tags:Short Text Classification, Deep Learning, Word Embedding, Convolutional Neural Network, Attention Model
PDF Full Text Request
Related items