Font Size: a A A

Research And Design Of Chinese Spam Filter System Based On Content

Posted on:2011-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:X N TuoFull Text:PDF
GTID:2178360305491257Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the fast development of Internet and widespread use of both computer technology and communication technology, E-mail has become an important internet communication tool in daily life. However, spam is becoming a threat against the system security. Thus, the research on anti-spam has become a global significant problem.This paper systematically analyzed the characteristics of the Chinese junk emails. In the respect of the Feature extraction technology:according to the influence factors of feature selection on spam filtering accuracy, analyzed the deficiencies of common feature selection methods. Then this paper presented a new approach, which composed the factors to show the feature's ability to filter spams. Using the Logistic equation directly calculated from the combination of factors to denote the value of feature, then select the feature according to the value. The experimental results show that the new approach is superior in feature selection.In addition, a new filtering method which called Forward method is proposed in this paper. This method selects the features based on ham. The experimental results show that this method hikes up the ability to distinguish the hams, which is the bottleneck of traditional filtering techniques. But this method also faces problems:the ability to make out the spams is not precise enough. Therefore, this paper presented new method that combines Forward method to the traditional method to filter spam to offset each other's deficits. The ways of combining is following three modes that proposed in this paper. The ability to distinguish the hams of Forward method and the spams of traditional filtering technique is the key of the combining filter. Accordingly to improve the accuracy, this paper proposed two points to improve the Bayesian. One is narrow range of feature selection; the other is improving the content of 'spam_hash'and'ham_hash'.The experimental results indicated that the capacity to pick out spams and to pick out hams has been notably enhanced at the same time. Make use of the three methods to evaluate this combination filter, Recall and Precision has reached 97% and 98%respectively, at the same time F Value reached 97%. It means that over 97% of spams were accessible filtered and 98%of the hams were recognized. The experimental results approve that applying this new filter method to filter spam is feasible and practical.
Keywords/Search Tags:Feature extraction, Logistic Equation, Spam filtering, Forward method
PDF Full Text Request
Related items