Font Size: a A A

Study Of Chinese-text-oriented Filtration Technology Based On HTTP

Posted on:2010-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LeFull Text:PDF
GTID:2178360278452777Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The 21 century is an Internet Era where people's work, study and daily life is closely related to network. Despite the convenience and efficiency brought along to the world, network has also left behind potential risks to social morality, legal system as well as politics. Basically, it is resulted from the flooding of inappropriate contents on network. This paper chiefly studies how to establish a firewall between network information and network users that could intercept inappropriate information off the visible range.Web page is quite an important information carrier. Targeting at HTTP protocol, this paper proposes a Chinese-text-oriented information filtering model. This paper focused on analysis for HTTP Datagram Protocol and Chinese text matching. The major ones as following are made targeting at the status quo of current research and existing:1. Analysis for HTTP Datagram Protocol can be conducted on two levels, User Model and Kernel Model. Despite its powerful interception function, Data interception under Kernel Model is technically hard to achieve because it asks for reconstruction and reduction. While data interception is easy to achieve under user model, what's more, it is easy to make analysis for its complete data intercepted. Therefore, SPI technology for network datagram interception under user model is employed in acquiring network data.2. Text filtration often gets interfered by some disguised words. This paper proposes the utilization of a character coding law which greatly improving the accuracy of filtration through a pre-treatment scan, removing sensitive information and splitting disguised.3. The speed of texts filtration is usually the bottleneck to the velocity of string-matching. The author here makes study on several common algorithm of string-matching, revealing that these algorithms are all targeting at small character text, ineffective in handling large character ones as Chinese texts. Therefore, by improving the WM algorithm, the author proposes CM algorithm, a fast character string-matching algorithm facing to Chinese texts.4. In dealing with the processing of inappropriate information involved Datagram, this paper adopts a method which reduces the error rate of interception to a certain extent by calculating the sensitivity of the data intercepted fristly, then make corresponding disposal in accordance with the strategy given by decision tree.
Keywords/Search Tags:Text Filtration, HTTP, SPI, Character Coding, String Matching
PDF Full Text Request
Related items