Font Size: a A A

Design And Implementation Of Distributed Meta-search Information Collection System

Posted on:2015-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:S C LiaoFull Text:PDF
GTID:2308330452457125Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Since the network media becomes more and more popular, a variety of platforms areused to distribute information. Everyday all kinds of websites produce new contents,which spread all over the Internet. The speed of producing information is much faster thanthat we consume them. As a result, large quantities of the wide-spread information areabandoned and waste because people are lack of an appropriate way to collect them. Thisproblem will bother people over a long period of time. Thanks to the web spider andsearch engine technology, people seem to find a way to overcome this problem.This thesis made some research on a distribute meta-search data collecting system. Itaims at collecting information about product security quickly and efficiently, which offersdata for building an early warning system to control food safety. This system is based onmaster/slave architecture. The master and the slaves have their own responsibilities. Themaster’s job is to define tasks and dispense them to the slaves.While the slave’s job is tomake requests for web pages and download them. The master and the slave exchange datathrough the communication over the Internet. The master dispenses the tasks based onsome strategies and the slave receives the tasks and launchs spiders to download the webpages. By using XQuery templet to extract information, we can store the structuredinformation to a database called HBase. The slave will record its running statuses and sendthem to the master so that the master can know if this node works well. Taking thewidespread web pages and the cost to collect them into consideration, this article suggestswe can use meta-search technology and take data from different search engines. Bycombining data from different engines, we can offer a better result for user.Finally, we constructed and deployed the program. We found that the system ranstably. By doing some tests about the functions and performance of the system, we provedthe feasibility of the system. Meanwhile this thesis also made some improvementsuggestions to deal with the existing problems.
Keywords/Search Tags:Blomming Information, Spider Technology, Distributed System, Meta Search, XQuery
PDF Full Text Request
Related items