Adaptive anti-spam e-mail filtering using Huffman coding and statistical learning

Posted on:2006-06-20

Degree:M.S

Type:Thesis

University:University of South Alabama

Candidate:Nerellapalli, Praveen R

Full Text:PDF

GTID:2458390005991469

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Unsolicited bulk e-mail, also known as spam, is an increasing problem for the society. This thesis presents a new anti-spam filtering strategy that (1) uses a practical entropy coding technique, called Huffman coding, to adaptively encode the feature space of an e-mail collection and, (2) applies logistic regression to fit a binary classification model to the collected data. We compared our technique to Naive Bayes and K-Nearest Neighbor, and demonstrated the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available. We also investigated the effect of skewed spam distributions on the performance of various techniques. Our contributions include a novel method for anti-spam filtering that is both effective and practical, and an adaptive learning model that may be applied in the area of information retrieval.

Keywords/Search Tags:

E-mail, Anti-spam, Filtering, Coding

PDF Full Text Request

Related items

1	Research On Auto-learning Anti-spam Services With No-labeled
2	Research Of Anti-Spam Based On Mail Purpose And Finger-Print
3	Email Security, Filtering And Inspection Techniques Studied
4	An Intelligent And Integrated Method Of Spam Filtering With Double Engines
5	Study And Implementation Of Spam Filtering Technologies Based On Rules
6	Research On Junk Mail Filtering Model Based On POP3 In MUA~2
7	Research And Implementation Of Spam Filtering System Based On The Sender Abnormal Behavior Detection
8	The Research And Implement On The Chinese Anti-Spam Filtering System Based On Advanced Winnow Algorithm
9	Analysis And Filtering Of Spam E-mail
10	The Design And Implementation Of Anti-Spam Engine Based-on Winnow