Research And Implementation Of Web Classification Based On SVM Algorithm

Posted on:2011-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:R R Chen

Full Text:PDF

GTID:2178360308961009

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

With the global popularity of the Internet, the world has entered into a high-speed information age. On the Internet, information increase sharply, people conveniently browse and share a big sum of network resources at the same time. However, negative, unhealthy content grow rapidly, which affect national stability and unity. It is hoped that in term of identifying web content, classifying web and filtering URL, user's behavior can be controlled on internet, harmonious and clean network environment can be created. With the increasing depth of research and application, Web classification has become an important research direction on data mining. This paper mainly studies Web classification algorithm, in addition, SVM algorithm has been improved, which is applied to telecom project based on Security Internet Gateway (SIG) and Unified Threat Management (UTM), the specific content are followings:(1) Studying Web classification model. The whole process of Web classification model has been studied through analyzing data resource, pre-processing HTML, segmenting word, extracting and training characteristic word.(2) Including classification algorithm of decision trees, K-nearest neighbor, Naive Bayes have been research. Introduce binary tree algorithm which is typical in decision tree, Naive Bayes algorithm which is based on Probabilistic Model, KNN algorithm which has a wide application in small text sample.(3) Focuses on the SVM algorithm based on statistics theory which is applied to high spatial dimensions. Taking a wide range of Web information into account, meanwhile, recently SVM multi-classification algorithm has been widely verified, SVM multi-classification algorithm has been compared and incremental learning algorithm has been discussed.(4) For the classifier training, kernal function of SVM multi-classifier has been modified which is based on strong support of statistical theory, the optimal classifier is gained ultimately. because the actual classification process is an incremental learning process, the single SVM algorithm could cause re-classification problem or empty-classification problem, we improved traditional SVM algorithm which is combining SVM algorithm and high efficient KNN algorithm to filter URL, It's proved by experiments that improved SVM algorithm enhance both precision rate and recall rate, which filters unhealthy URL effectively, cleans web content to achieve "green internet."...

Keywords/Search Tags:

Web, classification, SVM, KNN, green internet, pre-classification

PDF Full Text Request

Related items

1	Research On Internet Traffic QoS-related Feature Selection And Classification
2	Research On Internet Meme Classification Based On Deep Learning
3	Symbiosis Local Binary Model And Its Application
4	Design And Implementation Of Green Internet Management System For Mobile Internet
5	Research On Automatic Classification Methods Of Feature Information Based On Panchromatic Remote Sensing Images
6	In-depth Report: "Green" Is Surging
7	Support Vector Machine Based Analysis And Classification Of Internet Video Flows
8	Classification Model And Application Research Of Internet Text Data
9	Research On The Improved BEL Classification Model And Its Application In The Internet Of Things
10	Position-independent Classification For Multiple Stationary Targets Embedded In Closed Environment