Web Noise Recognition And Eliminating Methods Research

Posted on:2012-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:C Qin

Full Text:PDF

GTID:2218330368488468

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Currently Internet Web information explosion, internet has become an important source of information. When people browse the web, can appear a lot of navigation, the advertising message, copyright information, questionnaire and associated information, this information is often not the actual contents what the people to get, people say this web site "web noise". Usually people in through the network information retrieval software, such as search engines, inquires on the content of his own endeavors, hoping to search conditions (keywords, etc) closely related content displayed, and the best contains no or less contain web noise. Therefore, web pages, and eliminate noise identification in recent years has become very important in the fields of network information retrieval research topic.This paper firstly web pages related concepts and architecture are introduced, and then the existing web noise identification and elimination method is discussed and analyzed, based on this, advances a web noise recognition and eliminating methods. The basic thought of the method, according to the contents of the website is to generate the corresponding DOM tree, then according to the information provided by the DOM tree according to certain rules of web information to identify noise, and forming a suspicious web noise information representation model. In information retrieval, according to suspicious web noise information representation model of the information provided by the information, to retrieve VSM method, and according to the similarity calculation similarity calculation results confirm the final pages to remove noise. This paper analyzes the specific identification method web noise, suspicious web noise information representation model formation processes, specific algorithm, the similarity calculation and threshold selection methods, etc.The author of this paper puts forward noise recognition and eliminate the page in Heritrix +Lucene method, by the basis of frame, design of a related simulation environment, and in the circumstances, the web page using actual simulation experiments. Experiments show that, this paper gives the web noise set don't and eliminate method is feasible and effective, compared with other similar methods in web noise identification, both the accuracy and efficiency have improved.

Keywords/Search Tags:

Webpage purifying, Web noise identification and elimination method, DOM, Web noise information representation model, VSM

PDF Full Text Request

Related items

1	Page To Noise And The Classification Algorithm
2	Research Of The Technologies In Identifying And Filtering Webpage Noise Information Based On The Proxy System
3	Research Of Web Page Purifying Method Based On Document Object Model
4	Research On Topical Webpage Denoising Based On Improved DOM Tree
5	Webpage Noise Reduction Application And Research In Interactive Television
6	Noise Reduction Method And Subspace Based Noise-robust Recognition Method For Radar HRRP Data
7	The Study On Algorithms And Applications Of Noise Cancelling Technology Based On Adaptive Filter
8	The Research Of Webpage Denoising Method Based On Classification Technology
9	Research And Implementation Of "Yilan"Intelligent News Client
10	Research On Noise Modeling And Elimination Of Multi-input Multi-output Power Line Communication