| The filtration of harmful information has become an inevitable problem, with the increasingly severe Internet security. There is an urgent need for an efficient network data monitoring platform, purifying the Internet environment, helping people get rid of problems caused by harmful information. However, the actual effect of most network monitoring products in this area is still not very satisfactory. Mostly, they are simply using the way of keyword matching, which is not only too rigid, but also prone to misjudgment. Therefore, by improving the SVM classification algorithm, and combining the distributed structure model, the thesis propose a distributed network monitoring prototype system(DNMS), which is based on text mining.Firstly, the thesis studies the relevant technology in the field of network monitoring, including the bypass monitoring mode and transparent bridge mode. The thesis introduces the different monitoring effect by comparing the two modes. The thesis gives the reasons for selecting bridge mode, and then discusses the centralized and the distributed architecture, thus finalizes the architecture design of DNMS.Secondly, the thesis studies packet capture skill based on Netfilter, especially the basic principles of netfilter filtering framework in the Linux platform. And then the thesis discusses how to use this framework to build a packet analysising and filtering platform similaring to the firewall. In the aspect of packet parsing, the thesis focuses on the content recovery method of web and E-mail, including HTTP, SMTP and POP3. In addition, the thesis also studies how to extract the body of the web pages, so called the web denoising problem. Basing on summing up several common de-noising methods, combining the distribution of web pages text, the thesis proposes a de-noising method based on the distribution of web text block.As the core component of DNMS, the thesis introduces the filtering method of harmful information detailing. Basing on summarizing the results of previous studies, the thesis analyzes the particularity of harmful information filtering, compared with binary classification, thus proposes improved methods for the extraction and weight computing of feature items, making it more suitable for poor identification information. The thesis also applies the improved feature extraction method to the SVM(support vector machine) classification algorithm, which making a complete harmful information filtering framework. In addition, the thesis describes the various functional modules and the implementation process of DNMS in detail.Finally, in order to verify the effectiveness of the system, the thesis introduces the test of DNMS, and analyzes the test results. The results show that DNMS is able to withstand a certain amount of concurrent users. And the filtering module is able to complete the harmful information filtering task under the conditions of having a certain amount of samples. |