Font Size: a A A

The Study On Spam Filtering Based On Content Features

Posted on:2011-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LangFull Text:PDF
GTID:2178360308473170Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development and popularity of the Internet, Electronic mail (E-mail) as an economic and efficient means of communication, has become an indispensable communication tool for Internet users. But a flood of spam e-mail also brought great inconvenience to the mail-user. Because of the spam e-mails mostly use specialized e-mail address search software to collect e-mail address and disseminated by specialized mass-mailing softwares, only by artificial approaches to identify spam e-mail is unrealistic, certain technology means must be used for anti-spam work, Therefore, effective prevention and treatment of spam e-mail become an increasingly important area of research.The main contributions of the thesis are as follows.(1) Briefly introduce the concept of spam e-mail, analises the danger of spam e-mail, the background of anti-spam application, the status, the mainstream of filtering method and new challenges. Summarize the basic knowledge of spam e-mail, explore and study the key techniques for text-spam filtering, including Chinese word segmentation, data representation, feature dimension reduction, classification techniques.(2) An attribute set for anti-spam is proposed, the attribute set combines the subject of email with contact information and the new format of email-picture mail features. Three classifiers are constructed using three different classification algorithms. The experimental results demonstrate that the classifier constructed with the proposed attribute set of email provides a high accuracy in spam e-mail filtering.(3) Based on the studying of content attributes in email,an incremental, active learning method is proposed to dynamically update the e-mail feature set. The most valuable features are selected to improve the accuracy of the filtering. The experimental results demonstrate that the filtering provides a high accuracy in anti-spam.
Keywords/Search Tags:spam e-mail, text segmentation, feedback, image segmentation
PDF Full Text Request
Related items