Font Size: a A A

Clickbait Detection Based On Deep Learning

Posted on:2024-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J J ShenFull Text:PDF
GTID:2568306932454854Subject:Data Science (Mathematics)
Abstract/Summary:PDF Full Text Request
Clickbait refers to a kind of news that often uses some exciting,reversal,suspense and other types of title to lure readers to click,but the content is often lacking in authenticity and effectiveness.Their purpose is to earn click traffic and advertising fees.In view of the strong language style of clickbait news,it has been deeply studied at home and abroad.Traditional machine method learning requires a large amount of feature engineering with low accuracy,while deep learning has shown great advantages in the detection of headlines due to its deep network depth and large capacity,which can extract richer semantic information.There are two kinds of deep learning techniques involved in the detection of clickbait news:First,word embedding technology is used to express words as word vectors containing syntax and semantics.The second is feature extraction.By extracting the syntactic and structural information features of the title,we can understand the title and judge whether it is a clickbait.However,the commonly used word embedding methods do not take into account the part of speech information of clickbait news and cannot express the semantic information of clickbait news well;while feature extraction methods for headline detection tasks do not take into account the syntactic structure of headline news,so it is necessary to assign differentiated weights to feature extraction for clickbait.Therefore,this paper conducts the following research on clickbait detection from the aspects of word embedding and feature extraction:(1)We propose a word embedding model based on part-of-speech tagging:Word embedding methods commonly used in clickbait detection do not take into account the role of part-of-speech information in clickbait.As the basic grammatical attribute of words,part of speech is the key feature of words and sentences.In clickbait which often uses exaggerated and extreme words,part of speech information better reflects the characteristics of clickbait.Therefore,it is worth trying to extract the part-of-speech information from the title and classify it by part-of-speech tagging.In this paper,a word embedding model for POS and position information is proposed,which combines the POS vector containing POS information with the pre-trained word vector containing position information to better capture the POS features of clickbait.We conducted experiments on the Chinese and English clickbait datasets WCD and Webis-2017,and the experimental results verified the effectiveness of the proposed method.(2)We propose a Half-cyclic Bidirectional LSTM model integrating attention mechanisms.In recent years,attention mechanisms have shown great advantages in various kinds of natural language processing tasks.It enables all parts of an input sequence to participate in the calculation and can better extract features.In addition,due to the strong language style of headline party news,most of the eye-catching information such as reversal and stimulation are concentrated in the latter half of the headline,so it is considered to design a feature extraction network with more semantic information in the latter half of the headline.In this paper,we propose a Half-cyclic Bidirectional LSTM model integrating the attention mechanism.By adding the second half of the title into the loop,higher weights are assigned to the parameters of the second half,which improves the classification accuracy.In this paper,experiments were carried out on the Chinese and English clickbait datasets WCD and Webis-2017.The experimental results show that the Half-cyclic Bidirectional LSTM model achieves better results in the classification clickbait.
Keywords/Search Tags:Clickbait Detection, Natural Language Processing, Deep Neural Network, Word Embedding, Semi-cyclic Bidirectional Long and Short Term Memory Network
PDF Full Text Request
Related items