| Because of its convenience, rapid and low-cost, E-mail has becomes a network application which is used widely in Internet. It is a necessary implement of communication in our work and life. The spam comes with E-mail's widly using. Spam does a serious harm to us. Spam's spread not only consumes large resources of network, but also demages our computer system because it often carry hurtful informations and viruses. At the same time, spam causes users losing their time and money. For the harm of spam, researchers have proposed many motheds for anti-spam. These motheds filter spam basing on keywords, black/white lists, rules, etc. However, because of the limitation, these motheds is not "smart" and need many interactive works. Some researchers have proposed the anti-spam models basing on intelligence technology, for example, Bayesian probability estimation. Naive Bayes is a so simple, efficient mothed that many algorithms are proposed basing on it. But these algorithms often focus on single instance. These algorithms always assume that classifier can only classify one instance at one time point. On the other hand, they ofen need much and relatively fixed computing resource. In the real world, a large number of emails arrive at the mail server at the same time and need to be filtered, be mailed, and transmited in short time. The anytime classification model is imported into be anti-spam application in this dissertation. The anytime classification model is capable of delivering strong prediction accuracy with limited resources and utilizing additional resoureces to increase classification accuracy.In this dissertation, the origin, background and development of spam are systematically analyzed. And the dangerous of spam to economy, society and network are emphasized. Through these analysises, the advantages and shortages of anti-spam technologies are concluded. Basing on studying and discussing the machine learning theory of the Bayesion classification, this dissertation proposes some algorithms and solution motheds. Through the experiment, these algorithms can get better effect. Our researches include:1. Based on Bayesian network, the attributes of instances are separated to two parts, strong attributes and weaken attributes according weaking the assumption of attributes' independence of Naive Bayes. The Double Level Bayesian Network classification model is proposed and used for anti-spam application.2. The exiting anti-spam models basing on Bayes are not effective for online applications because they need relatively fixed computing resources. For this reason, the anytime classification model is proposed. AAPMIE (Anytime Averaged Probabilistic under Mutual Information Estimators) classification algorithm is proposed basing on mutual information theory. Each attribute has unique average mutual information. All attributes are ordered in a queue by their average mutual information. Then the classifier chooses attributes as SPODE one by one and updates the predictive probability of each class label. It gets better classifying effect. Especially, this classifier can decreases classification error quickly in early period.3. Based on Anytime Bayesian Classification Model, a new anti-spam model, the classification model basing on semi-Naive Bayes, is proposed. It selects a part of attributes as super parent. The result of experiment shows it can get better effection for anti-spam.4. Based on traditional anytime classification model, a new anytime classification model, SAAPE(Scheduling Anytime Averaged Probabilistic Estimators), is proposed, which mainly focus on total instances. When comparing to traditional anytime classification model, the SAAPE is more flexible. When user need a result, SAAPE can interrupt the computation and return the result to user. When it has more time, SAAPE can utilize additional resoureces to increase classification accuracy.5. A new anytime anti-spam system AASS (Anytime AntiSpam System), which addressed on the whole Emails, is constructed. |