Font Size: a A A

Research On Algorithm Of E-Mail Filtering

Posted on:2007-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2178360185485909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
E-mail plays more and more important role in our daily life and studying, it is convenient and easy to use. A coin has two sides. E-mail gives us convenience but spam appears at the same time. Sometimes we receive more spams than normal emails. So spam wastes our time and precious internet resources and this becomes more and more worst.At present we usually use approaches based on rules and statistics and also use them together in content based mail filtering fields. Approach based on rules is simple and effective but not smart because it runs short of flexibility. Also approach based on statistics uses statistical method to study keywords but it can not extract useful information between keywords in e-mail.Aiming at flaws presented in the two methods mentioned above, author used TEIRESIAS, a kind of pattern discovery algorithm, in spam filtering.TEIRESIAS finds patterns in spam training mail set and normal e-mail training mail set respectively. Mutual Information is used to select useful patterns in spam patterns. All selected patterns are used to filter e-mails. Author used spam assassin corpus to test TEIRESIAS algorithm and got good precision rate and recall rate and F value. Author studied the following content:1. The meaning of spam filtering. The comparison and analysis between spam filtering technologies.2. Introduction of the e-mail and the designing of mail filtering platform.3. The introduction of TEIRESIAS algorithm and detailed implementation of TEIRESIAS.4. The structure of spam filtering system based on TEIRESIAS algorithm and term selection methods used in filtering system.5. Comparison of experimentation results created by different term selection methods, including pattern database processing and mutual information methods.6. Comparison and analysis with former people based on open corpus.
Keywords/Search Tags:spam filtering, pattern discovery, mutual information
PDF Full Text Request
Related items