| The rapid development of advertisement has penetrated into people’s daily life,which has been affecting the consumers’ behavior,including their consumption choices and concepts.Positive advertising can definitely stimulate consumption and persuade consumers purchase more suitable products,thereby promoting economic development.However,malicious and even illegal advertisements deteriorate the rational operation of the advertising system,misleading consumers to buy products that are inconsistent with the advertisements or products they don’t necessarily need through enticing words,directly causing property losses.Considering that middle-aged and elderly people are more susceptible by illegal advertisements,this thesis is based on TV advertisements whose audiences are mostly middle-aged and elderly audiences,and introduces the idea of natural language processing into advertisement detection.An algorithm of advertisement recognition and illegal detection based on semantic analysis is studied,which provides strong technical support for advertising monitoring and follow-up processing of relevant departments.The main contents of this thesis are as follows:Firstly,aiming at the lack of relevant datasets,this thesis collects and constructs relevant datasets by myself.Due to the innovative nature and experimental ideas,it is necessary to create relevant datasets for experiments.This thesis constructs an advertisement identification dataset,an advertisement domain classification dataset,and an illegal advertisement detection dataset,especially the illegal advertisement detection dataset.Its unique coding ideas and labeling specifications are the first in this field.In the future,a more complete dataset will be made public so that more researchers can use it.Then,aiming at the low flexibility of existing advertisement recognition algorithms,this thesis studies a semantic-based reinforcement training advertisement recognition algorithm.The traditional advertisement identification algorithm is to retrieve advertisements by audio or keyword words,but such algorithm are easily disturbed by sentence pattern change or synonym replacement,which are less flexible.In this regard,this thesis uses the Bidirectional Encoder Representation from Transformers(BERT)network to identify the semantic features of the context of advertisements,and uses the semantic features to determine whether the sentence to be tested is an advertisement,and conduct intensive training on the TV advertisement dataset.The good experimental results show the algorithm is robust in normal advertising recognition,and the accuracy can reach 0.9562.The accuracy rate for TV commercials before reinforcement learning is 0.739,and the accuracy rate for TV commercials after reinforcement learning is increased to 0.919.Finally,the majority of the existing illegal advertisement detection algorithms can only identify the legality,without extracting illegal words and associate them with illegal laws and regulations.This thesis studiesti a algorithm for detecting illegal advertising words based on named entity recognition,which identify illegal keywords as entities.Specifically,it includes the following steps: first,use the BERT pre-training model to extract the dynamic word vector as the model input,then build a bi-directional long-short term memory network to obtain the information output score vector of the advertising text context,and finally combine the conditional random field to limit the labels to obtain the optimal Label.The experimental results show that the algorithm can not only extract illegal words,but also identify the laws violated by illegal words. |