Research On Intelligent Spam Message Recognition Algorithms Based On Deep Learning

Posted on:2020-09-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2428330623456438

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Spam message is a type of spam which contains commercial advertisements or non-compliant legal texts which users are unwilling to receive without their consent.With the popularity of mobile phones,spam messages are also increasingly rampant in daily life.This phenomenon has already seriously affected our daily life,and even social stability.China Mobile has blocked more than 200 million spam messages in 2017.This number is still increasing with time going by.Today,the average number of spam messages received by each person has reached 9 messages per month.The arrival of the big data era has allowed a large amount of personal information data to be accumulated,while the huge amount of data needs to be properly managed.When it comes to such a huge amount of SMS data,in order to ensure a better user experience,to find out more meaningful information for protecting people from spam harassment has become an urgent problem.With the rapid development of deep learning and natural language processing,the ability of deep learning model is further affirmed for information extraction.This paper conducts in-depth research on the deep learning method in spam message classification.Research contents and results are listed below:First,when preprocessing the spam message,it was found that the data noise is quite large,and the jieba participle could not recognize the new word.To solve this problem,the data is processed in a streamlined manner,including traditional word conversion,number and special symbol replacement,and typo correction.For the new words which are not recognized,an improved new word recognition tool is introduced,and the new word is imported into the jieba custom vocabulary.Then,in the process of spam messages identification,the RCM spam message recognition model combined with Bi-lstm and TextCNN is proposed,which solves the problem of polysemous expressions possess same expression,we also uses the histogram method to further extract nonlinear features of sentence vectors.The obtained features are merged with the sensitive features extracted by TextCNN,which improves the accuracy of spam recognition,reaching 96.81%.Finally,based on the original two-classification algorithm for spam identification system,in order to reduce the probability of non-spam SMS prediction as spam messages,a class of “no processing” is introduced.Both fixed threshold and difference threshold selection method are proposed for "no processing ",which is used to obtain a reasonable threshold,this method increases the accuracy by 1.013%,reaching 97.823%.

Keywords/Search Tags:

spam SMS recognition, natural language processing, RCM prediction model, preprocessing

PDF Full Text Request

Related items

1	Spam Filter Research And Design Based On Natural Language And Domain Ontology
2	Research And Implementation Of Key Technologies In Mathematical Natural Language Processing
3	Research On Natural Language Programming
4	Research On The Method Of Prediction Of Audit Suspects Based On Natural Language Processing Technology Under The Background Of Informationization
5	Research And Application Of Internet Web Log Preprocessing
6	Research On Spam Text Filtering Based On Deep Learning
7	Research And Application Of Intelligent Search Interface Technology Based On Natural Language Processing
8	The Methodology And Implementation Of Chinese Natural Language Query In Databases
9	Application Of Ontology Semantics-comprehension On Natural Language In Anti-spam
10	Research On Natural Language Understanding In AORBCO Model