Design And Implementation Of Distributed Computing System Based On Hadoop

Posted on:2016-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:K Z Guo

Full Text:PDF

GTID:2308330470978586

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In this paper, we study how to deal with a large amount of data. The purpose is to construct a distributed computing system with small size, low cost, high performance and low risk. Distributed computing and parallel computing fusion together, for the development of windows, running within the local area network of small, safe, fast processing large amounts of data, solve the traditional computing framework is not conducive to the expansion, single equipment bottleneck problems of calculation.Based on the research of Hadoop framework, this paper presents two services: distributed storage service and distributed computing service. The system uses three layer architecture design to support the whole cluster running. Master is responsible for the global information control, Job is responsible for scheduling tasks, Task responsible for data storage and calculation. Support users to upload a custom format of the data, not the original data for the two segmentation, can support the calculation of some specific native data types. Open algorithm API, and the traditional MapReduce framework for the extension, the calculation can carry other data source, easy data exchange processing. Localization of the Reduce process, saving the network transmission time of Map intermediate data. The combined interface of the scheduling service is the final result of the calculation, which will be used to calculate the pressure dispersion, and make full use of the machine in the cluster. Take the form of adding a dynamic link library (DLL file) to support users to embed a custom algorithm, the algorithm is pre stored in the Task node, save the start time. The default cluster computer has certain reliability, simplify the design of disaster recovery. The use of hardware resources in the calculation of the use of the strategy is to seize the type, the single unit in the implementation of the task according to the configuration file to load the cache, to speed up the next calculation. The number of data blocks determines the number of threads, the thread is safe to open multiple threads, high concurrency, and give full play to the performance of CPU. Finally aiming at the realization of the distributed computing system, two types of testing algorithm is constructed:a class is comparative wordcount algorithm with Hadoop cluster and another is reflect the image alignment algorithm for data interoperability. Through the analysis of the test data verified the system of small amount of data calculation request can real-time response, and as a running in a secure network environment of small clusters, computing power provided by the system can meet the needs of small and medium-sized general, achieving the desired design goals.

Keywords/Search Tags:

Distributed Computing, Hadoop, Real Time Response

PDF Full Text Request

Related items

1	Research On Real-Time Response About Heterogeneous Distributed System
2	Schedulability Analysis Algorithms For Distributed Hard Real-Time Systems
3	Analysis And Application Development Of Hadoop Distributed Computing Platform
4	Research On Job Scheduling Method Under Hadoop Platform
5	The Research And Implementation Of Real-Time Middleware Based On Priority
6	Research And Implementation On The Real-Time Assurance Mechanisms Of Robotic Distributed Computing Framework
7	The Design And Application Of Distributed Real-Time Flow Computing Framework Based On Akka
8	Research&Development Of Distributed Stream Real-time Computing Framework
9	Research Of Real-Time Task Graphs Response Time Analysis Technology And Implementation Of Tool
10	A hybrid approach for derivation of tight execution time bounds of program-segments and service time bounds of simple object methods in real-time distributed computing systems