| Unsolicited bulk e-mail, also known as spam, is an increasing problem for the society. This thesis presents a new anti-spam filtering strategy that (1) uses a practical entropy coding technique, called Huffman coding, to adaptively encode the feature space of an e-mail collection and, (2) applies logistic regression to fit a binary classification model to the collected data. We compared our technique to Naive Bayes and K-Nearest Neighbor, and demonstrated the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available. We also investigated the effect of skewed spam distributions on the performance of various techniques. Our contributions include a novel method for anti-spam filtering that is both effective and practical, and an adaptive learning model that may be applied in the area of information retrieval. |