Research On Chinese Spam SMS Filtering Method Based On Rough Set And Naive Bayes

Posted on:2013-02-13

Degree:Master

Type:Thesis

Country:China

Candidate:T F Cao

Full Text:PDF

GTID:2298330467453083

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

For series of problems and hazards caused by spam messages, many researchers have excellent researches about filtering spam message. Filtering methods about spam message mainly include black list method and white list method, method based on keywords and method based on message text. The first two methods are too simple and lack of agile, whereas method based on message text is more efficient. In this thesis, on the basis of previous studies, a feature weighting method is proposed and a system about Chinese spam message making use of Rough Set and Naive Bayes is designed.The main work of this thesis is as follows:1. Comparing several feature weighting methods, proposing a new feature weighting method which ensures the accuracy of classification coupled with Minimum Classification Error training method based on the traditional TFIDF, and experimental results prove the feasibility of the method.2. Two stage filtration using Rough Set and Naive Bayes. During the first stage, some basic character attributes and a decision attribute are extracted from the message header and content of the message, and so on. Rough Set is used to train decision rule, when test message comes, extracting related attributes that are existed in decision rule, if match between test message and decision rule is existed, then test message can be categorized into certain class, otherwise, test message need to be brought into the second stage. During the second stage, after splitting words and getting rid of stop words, message can be denoted by vector space model, in which every dimension can be calculated by weight formula, specifically, the value of every dimension consists of term frequency, feature entropy and a parameter which Minimum Classification Error trains every term to. During feature selection, selecting those terms which are larger than a fixed threshold. Finally, Naive Bayes classifies message according to terms from feature selection.3. Constructing a message corpus in the form of XML. Some characteristics of one message and message text are taken as a node, the XML is very suitable to create a simple database.4. Finally, a simulation system for message classification is constructed, and it is proved to be feasible.

Keywords/Search Tags:

Rough Set, Naive Bayes, spam message filtering, feature weighting

PDF Full Text Request

Related items

1	Research On Shielding Mechanism Of Short Message Spam And It's Application
2	Application Of Improved Naive Bayesalgorithm In Spam Filtering
3	Research On Spam Text Classification Based On Improved Naive Bayes Algorithm
4	Spam Filtering Techniques, Based On Data Mining
5	Rearch On Content-Based Spam Filtering Technology
6	Design And Implementation Of Short Message Classification System Based On Naive Bayesian
7	Research On Spam Filtering Technologies Based On Content Characteristics Analysis
8	Study On Spam Filtering Technology Based On IMI-WNB Algorithm
9	The Research And Application Of Text Categorization Arithmetic In Spam Filtering
10	Research On Spam Filtering Technology Based On Bayesian Classification