Study Of Chinese-text-oriented Filtration Technology Based On HTTP

Posted on:2010-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Le

Full Text:PDF

GTID:2178360278452777

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The 21 century is an Internet Era where people's work, study and daily life is closely related to network. Despite the convenience and efficiency brought along to the world, network has also left behind potential risks to social morality, legal system as well as politics. Basically, it is resulted from the flooding of inappropriate contents on network. This paper chiefly studies how to establish a firewall between network information and network users that could intercept inappropriate information off the visible range.Web page is quite an important information carrier. Targeting at HTTP protocol, this paper proposes a Chinese-text-oriented information filtering model. This paper focused on analysis for HTTP Datagram Protocol and Chinese text matching. The major ones as following are made targeting at the status quo of current research and existing:1. Analysis for HTTP Datagram Protocol can be conducted on two levels, User Model and Kernel Model. Despite its powerful interception function, Data interception under Kernel Model is technically hard to achieve because it asks for reconstruction and reduction. While data interception is easy to achieve under user model, what's more, it is easy to make analysis for its complete data intercepted. Therefore, SPI technology for network datagram interception under user model is employed in acquiring network data.2. Text filtration often gets interfered by some disguised words. This paper proposes the utilization of a character coding law which greatly improving the accuracy of filtration through a pre-treatment scan, removing sensitive information and splitting disguised.3. The speed of texts filtration is usually the bottleneck to the velocity of string-matching. The author here makes study on several common algorithm of string-matching, revealing that these algorithms are all targeting at small character text, ineffective in handling large character ones as Chinese texts. Therefore, by improving the WM algorithm, the author proposes CM algorithm, a fast character string-matching algorithm facing to Chinese texts.4. In dealing with the processing of inappropriate information involved Datagram, this paper adopts a method which reduces the error rate of interception to a certain extent by calculating the sensitivity of the data intercepted fristly, then make corresponding disposal in accordance with the strategy given by decision tree.

Keywords/Search Tags:

Text Filtration, HTTP, SPI, Character Coding, String Matching

PDF Full Text Request

Related items

1	Study Of The Multiform Application Of HTTP Protocol
2	Study Of The Technology Of Monitoring HTTP Packets And Keyword Filtration
3	Research On Filter Algorithms For Approximate String Matching
4	Studies On String Mathching Algorithm Based On The Characterâ€™s Law
5	The String Pattern Searching Algorithms Based On Suffix Arrays
6	Studies On The General Parallel Method To Improve The Performance Of String Matching Algorithm
7	Research On Optimization And Application Of Fusion Text Information Association Matching Mode
8	Study On Matching Method Of Special Character String Based On Snort System
9	Approximate String Matching For Chinese Characters By Combining Filtering And Bit-parallelism
10	Research On Character Coding Based Text Steganography And Its Attack Methods