Font Size: a A A

Study And Implementation Of Spam Filtering Technologies Based On Rules

Posted on:2005-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:C YinFull Text:PDF
GTID:2168360152955447Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
E-mail is becoming a faster and most economic communication method with internet more popular and more popular. However, when users receive useful mails from their mailbox, all kinds of advertisement mails, E-mail bomber, and mail virus are full of their mailbox without any requirement. It makes users have no choice but to delete these junk mails with much time and energy. Besides this, that junk mails transfer over internet day and night causes mail server congested, decreases overall availability of internet and brings great loss to mail service providers. Therefore, how to select useful mails from great number of mails has become our focus. It's urgent to study and develop spam filtering system.Firstly, in this paper, after comprehensively understanding of E-mail protocol, primary spam filtering technologies are discussed, such as content filtering, blacklist and whitelist and SMTP authentication technology. Advantages and disadvantages of every kind of technology are expounded in this paper. Also, summary on spam filtering technologies are made in this paper.Then, mathematical description is discussed and it is the most important part of this paper. Firstly, technical foundations of mathematical description are discussed, including transmitting principle of spam, perspectives of spam processing - "view of varia", spam criterion rules, regular expressions and naiveBayes model. These technical foundations are the theoretical evidence of mathematical description. Then, the definition and classification of spam are discussed. Based on research of present definition and classification of spam given by some organizations and governmental departments, new definition and classification of spam are brought up. The definition and classification of spam are the foundation and evidence of spam judgement. Finally, mathematical description is brought up separately, which is on e-mail header, e-mail subject and text, e-mail attach file.On the basis of mathematical description, the implementation of spam filtering technologies based on rules is discussed. Filtering technologies that combine content filtering technology and blacklist and whitelist technology are employed in the system to judge a letter of mail as useful mail, suspectable mail or junk mail. And then, detailed design of realized system is discussed. Interface and flow process diagram of every mould are also provided. At last, subsystem of anti-spam system, that is, dynamic blacklist, is described in this paper. And system testing result is also provided.It is creative that header features and body features of junk mail are studied with mathematical method. According to mathematical description on spam features, a kind of mathematical method to judge whether a mail is a letter of junk mail is brought up. Mathematical description on spam features is divided into e-mail text(including e-mail subject) and e-mail attach file because great majority of junk mails haven't attach file. Therefore, e-mail attach file doesn't be used when mathematical description on spam features is used under great majority of circumstances. In addition, it is understood easier for in terms of logic.
Keywords/Search Tags:spam/junk, mathematical description, mail content filtering, blacklist and whitelist, E-mail protocol
PDF Full Text Request
Related items