Font Size: a A A

Algorithm. Bayesian Spam Filtering Technology Research And Application

Posted on:2012-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:R Y MaFull Text:PDF
GTID:2218330368976259Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
E-mail has grown into one of the most general applications of the Internet since the naissance of the Internet technology and it has already become an absolutely necessary communication way for people. However, a serious problem has also come into being with the convenience E-mail brings to us—that is junk mail. It occupies much of the network bandwidth, which usually results in the jam of the network communication, In consequence, normal users could not link to the Internet or open the right E-mail, which would waste both the time and the energy of the user seriously, cause unreasonable usage of the Internet resource and influence the information security of the Internet. Thereby how to filter the junk mail on the Internet and improve the filtration efficiency has become a primary problem for E-mail providers and users. The study on the filtration technology of junk mail is of extremely significance in the application technology of the Internet.On the basis of the Bayesian algorithm, this paper designs a set of filtration program aiming at the Chinese junk mail by studying the relative technology of the E-mail format, the transmission process and the filtration of junk mail. This program uses black-and-white list technology and the filtration of key words as its auxiliary tools, Chinese word segmentation and Bayesian algorithm as its key points and it achieves this junk mail's filtration system. In this system, the Chinese word segmentation algorithm introduces the treatments of noise reduction and deleting stop word. At the same time, it realizes the system's autonomous learning function by continuously increasing the number of sample training sets in the Bayesian filtration process.At last, the author tests the system by using E-mail's sample sets and test suite, and employs relative evaluation parameters from the text categorization and information statistics as the system's evaluation system. The test results manifest that this system has relatively excellent performance on the filtration of junk mail.The development of this system is based on Myeclipse6.5 Platform, the database is Oracle and the programming language is java.
Keywords/Search Tags:junk mail, Bayesian algorithm, Chinese word segmentation, autonomous learning
PDF Full Text Request
Related items