Research And Implement Of Network Information Collection And Search System

Posted on:2015-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:X L Wang

Full Text:PDF

GTID:2298330452994421

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the coming of the Internet big data era, the booming network brings people awealth of information resources. Faced to huge amounts of Internet information, how toaccess the valuable information quickly and accurately has become a difficult problem.With the emerging of information retrieval system accordingly, search engines provide aconvenient way for information access. A great deal of research in this field has been doneby many scholars. In the process of information collecting, the general whole networksearch engine ignores the subject and the processing order of the information, caused thebroad disordered and uncorrelated results, we need secondary processing to obtain valuableinformation.In order to solve this problem, this paper studied the correlate method of informationretrieval, proposed a method, which could search more in-depth information for a certainfield, and implemented the dynamical maintenance and optimization of the informationindexes. Its main work can be summarized in the following three aspects:1) This paper studied the web crawler Nutch, the distributed computing frameworkHadoop and the work procedure of MapReduce, realizing the distributed crawl based onNutch, and storage the unstructured network information as structured file.2) The index building of information retrieval was achieved. The full-text indexingtool Lucene was researched, and reverse index for Nutch crawled text was constructed,laying the foundation for further index processing. Index pool model was proposed andconstructed, and index pool maintenance and dynamic optimization were achieved by theuse of index evaluation function, thus improving the quality of the index.3) In this paper, network information collection and search system was designed anddeveloped, and provided userâ€™ interests sorted collection and timed information pushservice.

Keywords/Search Tags:

information collection, Nutch, Hadoop, reverse index, informationretrieval

PDF Full Text Request

Related items

1	The Research Of Nutch-Based Mobile Search Engine For Rural Information Service
2	Research And Implementation Of Search Engine Based On Nutch Architecture
3	Study On Hadoop-based Inverted Index
4	Study And Implementation Of Inverted Index On Hadoop
5	Research And Implementation Of Mobile Web Search System Based On Nutch
6	Design And Realization Of The Machinery Fault Diagnosis Distributed Information Retrieval System Based On Hadoop
7	The Study Of Text Index Construction For Large-Scale Dynamic Collection
8	Design And Implementation Of Book Collection And Storage Operating System Based On Hadoop
9	Large-scale Bilingual Parallel Corpus Collection System Based On Hadoop
10	Design And Implementation Of The Platform Of Operation And Maintenance Log Collection And Analysis Based On Hadoop