Font Size: a A A

Advertisement Comments Detection Method Based On Semantic Information And Supervised Learning

Posted on:2019-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:X LinFull Text:PDF
GTID:2428330548479737Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The popularization of Internet makes people's entertainment and communication quite convenient and efficient.Computers and mobile devices become one of the most important tools.With the growing of user amount of both PC and mobile client,the amount of users and clicks of popular websites are rising quickly.As the top famous video website,YouTube has large amount of videos and billions of active video makers as well as viewers.Due to the fact that every user of YouTube can comment easily,and YouTube also adopted a monetization system to reward video producers,the ratio of spam comments has raised quickly,which disturbing both the video uploaders and ordinary viewers,and bothering user's ordinary comment and communication need.Traditional spam detection algorithms based on bag of words(BoW)usually make use of words and vocabulary,and had the feature of high dimensionality and complex models,and with the upgrading of spam-sending trick,their deficiency becomes more and more obvious.With consideration of these this dissertation presents a semantic-based spam comment detection method considering the natural language understanding and previous works.This method firstly does the semantic tagging,and then attracts the features of the semantic information.Then with a little of manually extracted features of words,we conduct the experiment.The outcomes of the experiments show that this method can reduce the number of dimensionality and also reached a reasonable accuracy,and with the lack of diversity it also has stability of classification,so it has certain feasibility.Due to the fact that in the real world,labeled data is precious and hard to obtain,chapter 5 used the method of cooperative training algorithm which could make full use of labeled text as well as unlabeled text,and the consequent experiment shows that the accuracy is improved by the added data during the training process.More specifically,this dissertation's work contains the following contents:(1)Summarizing popular spam detection methods,including feature selecting methods as well as classification methods.(2)Present the spam detection methods based on semantic information,and propose the semantic feature extracting method.Then a classification model is built on these data and experiment shows good results.(3)Based on both semantic view and BoW view,we adopted cooperative training methods,which could solve the problems of only a little amount of labeled data and a large amount of unlabeled data.
Keywords/Search Tags:Machine learning, comment filtering, semantic feature extract, model, ensemble learning, co-training
PDF Full Text Request
Related items