Font Size: a A A

Intelligent Analysis And Implement Of Compliant Information Based On Filter Technology

Posted on:2012-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YuanFull Text:PDF
GTID:2178330335499745Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
According to the filter technology of SMS on the basis of the classification algorithms based on statistical theory, the message get from the report platform is intelligently analyzed and researched. Use the data with clear prosperity value of type to guide analysis that abstracts the characters about spam messages and then analyzes the unclear data. At last, by the form submit the whole result as theoretical basis to the correlative department in order to figure out the analysis's difficulty of sporadic and great report messages.Existing filter system is the expansion of text classification based on keywords, having the following drawbacks:the fixed structure of dictionary decreases the flexibility; get samples one by one in the course of analysis generating classification so it is not available to the data with difference then low the versatility; ensure the reliability of the overall system, without taking into account the risk of the extraction process of keyword. Therefore, consummate the SMS filtering policies based on the Bayse Classification Algorithm, from the following there point:system's flexibility, versatility and precision, to bring up the feasible and effective solution. The main contents are as follows:(1)Flexibility of the system:â‘ The extraction of basic keyword:the basic longest match strategy is combined with the fuzzy matching of string, looking for minister words, only when the above-mentioned two matches both fail.â‘¡Extract feature words: strengthens system's flexibility by combining existing dimension reduction of SMS filter with text classification's concentration degree, dispersion degree and average degree in the class and weighted sum to raise the word's contribution to classification accuracy and comprehensiveness of the classification rules.(2)Versatility of Classification:applies the random abstraction of samples in the theory of probability to avoid the over-fitting samples which make the classification have limitation.(3)The accuracy of classification:extends the thinking of having the smallest risk in the two types of classification problems to the more types problems in order to further enhance the credibility of the system from the following two point:the abstraction of keywords and classification's result. As result, the risk in the system is minimized.Based on the above-mentioned research about the strategy of intelligent analysis of report messages, realizes a report platform system which is proved to have the following advantages:more flexibility,efficiency and accuracy compared with previous strategy,by using experimental data to test.
Keywords/Search Tags:Spam Message, Intelligent Analysis, Segmentation of Chinese Word, Extraction of Keywords, Smallest Risk
PDF Full Text Request
Related items