Font Size: a A A

Study On Spam Filtering Technology Based Bayes

Posted on:2009-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q M LuFull Text:PDF
GTID:2178360245971188Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, E-mail has become a primary means in modern telecommunication. However, spams (also named as "junk mails") ,simultaneously pervade widespread on line, bringing a lot of troubles to numerous users. Therefore, it is important and practical to prevent and control spasm effectively.The thesis, on the one hand, investigates thoroughly considerable anti-spam documents and data from both home and abroad. Furthermore, analysis and conclusion are made on existing anti-spam techniques. The E-mail filter technology is an important measure against spams, which at present is mainly based on IP address, rules and the content respectively,and the latter two are mainly based on the contents.The thesis mainly talked about spam filter algorithm based on contents,whose feature is text categorization,i.e.to preprocess the text content of mail and then recognize spams over text categorization. And at the same time Baysian algorithm and its categorization model are studied deeply in the dissertation. A detailed analysis and comparable testing on PG Baysian algorithm are put forward throngh the experiments,in which the strengths and limitations of austerity Baysian algorithm in the anti-spam filter are mainly discussed.In order to increase the accuracy and the efficiency of Chinese words sputter,the algorithm is selected on the basis of the characteristic of x2 and try to improve through the method of balancing the key words;and through the introduction of the minimum risk,the risk of the misjudgement on the spasm is reduced to the aim of decrease of the frequency of interference in order to increase the efficienly of recognition;and through the forward of the cognition learning algorithm,increased the capability of self-study of the model and reduced the recognition difficulties of the vector quantities spams,so that the model can reach the perfect accuracy.The thesis puts forward a better solution to vector quantities spam filter through technique based on minimum risk of austerity algorithm and through the introduction of cognition learning.The experients proves that the forward of the method increased the recognition percentage of the spams,especially solved the problems of the spam filter,and finally pay its effort for the research on the basis of artificial intelligence.
Keywords/Search Tags:Bayesian algorithm, Chinese Words Sputter, Feature Selection and Extraction, minimum risk, cognition-learning
PDF Full Text Request
Related items