Font Size: a A A

Nutch-Based Distributed Search Engine Design And Research

Posted on:2011-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ShiFull Text:PDF
GTID:2178360302490269Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Based on the Nutch open source search engine framework, this article puts forward dynamic data blocks allocation mechanism which is based on the Hadoop platform which Nutch is based to implement the split blocks storage of the file data, and improve tasks schedule strategy and data storage support of the data acquisition subsystem of Nutch, so the implementation of parallel computing balance the cluster load effectively.In this paper, we show the data blocks dynamic allocation mechanism mathematical model and its calculation formula. At the same time subsystem is based on open source BDB, and we give the Nutch data acquisition subsystem architecture design model and achieve system integration.By comparing the experimental approach, configure Nutch search engine system before and after improvement and make it running, then collect test result data. Experimental results show that data blocks dynamic allocation can balance the nodes in the cluster, such as count of tasks, the time of running tasks, thus improving the overall system pressure distribution to achieve load balancing.
Keywords/Search Tags:search engine, data acquisition, block dynamic allocation, load balancing
PDF Full Text Request
Related items