Design And Implementation Of Big Data Cloud Storage And Integrated Application System

Posted on:2016-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:G H Huang

Full Text:PDF

GTID:2348330491461758

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As a big data application of government has accumulated a wide variety of data, the PB amount of mass data information, but also to produce more than 1TB of data a day. The sources of various data sources are not unified, the data types are diverse, the data storage methods are different, the business systems are scattered. The feedback speed of the users to the data retrieval and application is higher and higher, the existing system has a sharp decline in the performance and the full-text search ability. At the same time, how the external data of multiple source formats is loaded into the database efficiently and quickly is also a problem to be considered. Therefore, we urgently need to use big data processing technology to design the business application of big data storage and integrated application program. This article is to discuss how to solve the above problems through the Hadoop, ElasticSearch technology and ETL applications.Traditional relational database has a large performance bottleneck in the process of big data and traditional full-text database, which need to be solved by using distributed data comparison engine and distributed full-text search technology, etc., and the high frequency of multi data source is also need to be realized by ETL tool. At present, the distributed data search technology based on Hadoop architecture, ElasticSearch distributed full-text retrieval technology, and full-text retrieval technology, as well as ETL Kettle applications can meet the above requirements. But there are still some problems in the data comparison and retrieval efficiency of address types, and the high frequency incremental load efficiency of multi data sources. The improved optimization of the address matching algorithm, the Chinese word segmentation and Kettle’s own data loading plug-in is needed. To solve these problems, the main work includes:(1) the overall architecture and function design of the system is analyzed. (2) to establish a distributed data matching engine and optimize the address comparison algorithm. (3) to establish a distributed full-text search application and improve the full text search efficiency; (4) to select a suitable ETL extraction method for high frequency incremental load, and improve the data loading performance by using multi thread processing method and loading code. It is proved that the distributed data matching engine and whole text retrieval technology can meet the requirements of a certain unit. Through system testing and implementation, it is proved that Hadoop, ElasticSearch based distributed data matching engine and full-text retrieval technology can solve the problem of the full text retrieval of the full text search and data, and can achieve the high frequency incremental loading of multi data source, reduce the overall investment, improve the overall performance of the system, and realize the integration of the business system.

Keywords/Search Tags:

Big Data, Cloud Storage, Distributed Full-text Retrieva, Data Comparison

PDF Full Text Request

Related items

1	Design And Implementation Of Full-Text Search System Based On Cloud Storage
2	Research On Key Technologies Of Massive Data Storage System For Full Text Retrieval
3	Massive Data Storage And Full-text Search
4	Research And Implementation Of Ciphertext Full-text Retriveal Technology In Cloud Storage Environment
5	Research And Implementation Of Storing Simulation Resources On Cloud Storage
6	Research On Secure Storage And Ciphertext Query Of Cloud Data
7	A Trusted Control Model Of Cloud Storage
8	Cloud Storage System For Massive Data And Applied Research
9	AN EVALUATION OF THE APPLICABILITY OF RANKING ALGORITHMS TO IMPROVING THE EFFECTIVENESS OF FULL TEXT RETRIEVA
10	Research On Key Technologies Of Distributed Storage In Cloud Computing