Design And Implementation Of Online Data Processing System Based On Hadoop

Posted on:2016-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:R N Guo

Full Text:PDF

GTID:2298330467993016

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the widely using of computer technology and the Internet in all aspects of human social life, the amount of data shows explosive growth. Nowadays, the storage and processing of big data has become a new challenge for many companies. Accordingly, the application of big data processing attracted widespread attention. Hadoop is an open source platform which deals with the storage and processing of big data through distributed computing. Hadoop implements HDFS and MapReduce programing model, which can handle the large-scale data. However, these techniques are mainly designed for offline data and can not be used to real-time computing of big data.Moreover, with the popularity of intelligent terminals, users can access the Internet anytime and anywhere. The gradually growing of streaming data and the real-time characteristic of content and services in the Internet will demand the higher computing power. And it results in real-time distributed computing platforms such as Storm. As Hadoop represents the storage and processing of offline data, Twitter Storm represents the computing of streaming data. Besides, the Yahoo! S4, Spark of the University of California Berkeley, the Puma of Facebook are popular real-time computing framework. But these frameworks focus only on real-time computing; they have not provided service of data source accessing. And users not only build the deployment of these frameworks, they also need to learn the corresponding programing models. High cost of learning will reduce the efficiency.This paper designed and implemented a distributed scalable platform architecture for online data and implemented a system to provide service of data storage and statistics. Users can focus on their own business logic instead of programing, allowing the development of tasks of online data more conveninent. The system is based on Hadoop platform, using Storm as the framework of real-time computing which provides external environment for performing online tasks. And the System uses HBase which is a KeyValue database as main storage, so that it can still stable service in the case of high concurrency. In addition, the system provides users with a unified communication rules. Users can customize the business processing logic based on this set of rules, which greatly improving the productivity.

Keywords/Search Tags:

hadoop, distributed computing, streaming datastorm online data

PDF Full Text Request

Related items

1	Design And Realization Of A Online Data Mining System Based On Hadoop
2	Design And Implementation Of Distributed Data Examination System Based On Hadoop
3	Design And Implementation Of The Online Shopping System Based On Hadoop Cloud Computing Framework
4	A Distributed Cache And Analysis Platform For Large Scale Streaming Data Based On Kafka
5	Online Learning Algorithms For Classification Of Streaming Data
6	Design And Implementation Of Distributed Data Storage Based On Hadoop
7	The Research And Application Of Distributed System Based On Hadoop
8	Benchmarking And Tuning Distributed Streaming Platforms
9	Research On Online Streaming Feature Selection Algorithm Based On Granular Computing Theory
10	Research And Implementation Of Integration Of R Language And Hadoop