Font Size: a A A

Adaptive anti-spam e-mail filtering using Huffman coding and statistical learning

Posted on:2006-06-20Degree:M.SType:Thesis
University:University of South AlabamaCandidate:Nerellapalli, Praveen RFull Text:PDF
GTID:2458390005991469Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Unsolicited bulk e-mail, also known as spam, is an increasing problem for the society. This thesis presents a new anti-spam filtering strategy that (1) uses a practical entropy coding technique, called Huffman coding, to adaptively encode the feature space of an e-mail collection and, (2) applies logistic regression to fit a binary classification model to the collected data. We compared our technique to Naive Bayes and K-Nearest Neighbor, and demonstrated the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available. We also investigated the effect of skewed spam distributions on the performance of various techniques. Our contributions include a novel method for anti-spam filtering that is both effective and practical, and an adaptive learning model that may be applied in the area of information retrieval.
Keywords/Search Tags:E-mail, Anti-spam, Filtering, Coding
PDF Full Text Request
Related items