Research And Implementation Of Distributed Network Monitoring System Based On Text Mining

Posted on:2015-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:H Qin

Full Text:PDF

GTID:2308330473953323

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The filtration of harmful information has become an inevitable problem, with the increasingly severe Internet security. There is an urgent need for an efficient network data monitoring platform, purifying the Internet environment, helping people get rid of problems caused by harmful information. However, the actual effect of most network monitoring products in this area is still not very satisfactory. Mostly, they are simply using the way of keyword matching, which is not only too rigid, but also prone to misjudgment. Therefore, by improving the SVM classification algorithm, and combining the distributed structure model, the thesis propose a distributed network monitoring prototype system(DNMS), which is based on text mining.Firstly, the thesis studies the relevant technology in the field of network monitoring, including the bypass monitoring mode and transparent bridge mode. The thesis introduces the different monitoring effect by comparing the two modes. The thesis gives the reasons for selecting bridge mode, and then discusses the centralized and the distributed architecture, thus finalizes the architecture design of DNMS.Secondly, the thesis studies packet capture skill based on Netfilter, especially the basic principles of netfilter filtering framework in the Linux platform. And then the thesis discusses how to use this framework to build a packet analysising and filtering platform similaring to the firewall. In the aspect of packet parsing, the thesis focuses on the content recovery method of web and E-mail, including HTTP, SMTP and POP3. In addition, the thesis also studies how to extract the body of the web pages, so called the web denoising problem. Basing on summing up several common de-noising methods, combining the distribution of web pages text, the thesis proposes a de-noising method based on the distribution of web text block.As the core component of DNMS, the thesis introduces the filtering method of harmful information detailing. Basing on summarizing the results of previous studies, the thesis analyzes the particularity of harmful information filtering, compared with binary classification, thus proposes improved methods for the extraction and weight computing of feature items, making it more suitable for poor identification information. The thesis also applies the improved feature extraction method to the SVM(support vector machine) classification algorithm, which making a complete harmful information filtering framework. In addition, the thesis describes the various functional modules and the implementation process of DNMS in detail.Finally, in order to verify the effectiveness of the system, the thesis introduces the test of DNMS, and analyzes the test results. The results show that DNMS is able to withstand a certain amount of concurrent users. And the filtering module is able to complete the harmful information filtering task under the conditions of having a certain amount of samples.

Keywords/Search Tags:

text mining, the distributed architecture, the netfilter filtering framework, webpage text extraction, support vector machine

PDF Full Text Request

Related items

1	Design And Implementation Of Content-based Webpage Collection And Classification System
2	Study On Text Category Oriented Chinese Text Mining And Its Implementation
3	Research On Filtering Algorithms Of Text Information Based On SVM
4	Research On Text Extraction Technology In Video
5	Study On Text Categorization Method Based On Support Vector Machine
6	Text Filtering Key Technologies
7	Research On Text Extraction In Natural Scene
8	Research On Text Classification Filtering Technology Based On Latent Semantic Indexing And Support Vector Machine
9	Research On Support Vector Machine Based Text Classfication
10	Study On Multi-class Text Classification Based On Support Vector Machines