Font Size: a A A

Image Spam Feature Selection Algorithm Research And Implementation,

Posted on:2011-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:2208360308967349Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Recently, the spreading spam creates serious influence on people's communication and life on internet, image spam has more increasing day by day. In order to stop the spreading of image spam, researchers proposed detecting algorithms based on different features collection of image spam. Then at the depletion of time and accuracy, these algorithms can not satisfy both sides. This dissertation makes exhaustive analysis on the features of image spam, and provides R-ReliefF algorithm to optimize the feature collection of image spam and improve the detecting algorithms'performance. On the base of these, the dissertation analyzed filter algorithm on a new image spam-mutiframe image spam. Finally, according to analyzing relative merits of existing detecting systems of image spam, the dissertation summarizes a multiple-level image spam filter system combined with the R-ReliefF algorithm and the mutiframe image spam detection algorithm.Firstly, this dissertation provides an overview in image spam detection, including the difficulties of detecting image spam, the mainstream detected algorithms of image spam, the evaluation methods of the algorithms. The mainstream detected methods have not got the desired effect. The dissertation finds the reason is :â‘ The features collection has not optimized.â‘¡New kinds spam image appear. Therefore, the dissertation proposes that selecting more effective feature collection of image spam should put first.The dissertation extracts the features of image spam which are generally used by the existing detecting algorithms and creates primitive feature collection. Because the number and the kinds of the feature have many varieties, the collection inescapability contains some unrelated and redundant features, all these influence on the performance of the algorithm. This dissertation provides a feature selection algorithm-R-ReliefF algorithm, including preprocessing of feature data, calculating relevance among features and class, calculating redundancy among features, and RMerits evaluating on feature subset. Finally, geting the optimization subset of features more effective on detecting image spam and extracted easily. The experiment shows that the R-ReliefF algorithm can optimaze the features collection and reduce the time in training and detecting. This dissertation uses the R-ReliefF algorithm to identifying a new kind of image spam-multiframe image spam. Firstly, extracting the features contain features among frames and features of each frame, and creating the features collection of mutiframe image spam; Secondly, constracting and selecting the collections by R-ReliefF algorithm, getting the effective features subset; thirdly, getting the detected result by mechine learning algorithm. The mutiframe image spam detection algorithm can identify almost 90% mutiframe image spam, though the mutiframe image spam has more interference elements, compared to traditional simpleframe image spam.Finally, the dissertation provides a multiple-level image spam filter system based on R-ReliefF algorithm. Firstly, the system selects the image's all features by R-ReliefF algorithm and gets three classes features-file properties, color and text; Secondly, creats three detected submodules of image spam using the three classes features. Then gets the final result by vote counting of the three submodules'results, and the submodule which get the error result can learn by self; Finally, create the multiple-level filter system combined with the mutiframe image spam and conventional detection algorithm. The experiment shows that the image spam filter system can identify 97% native simpleframe image spam and keep small cost of time, and the system is a pratical image spam filter system.
Keywords/Search Tags:image spam, R-ReliefF algorithm, feature selection algorithm, mutiframe image spam
PDF Full Text Request
Related items