Research On Key Technologies Of Massive Data Storage System For Full Text Retrieval

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:S B Gao

Full Text:PDF

GTID:2568306914478714

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet and big data,there is a constant stream of all kinds of data being generated all the time,and the geometric growth of data has brought more challenges to the storage of data centers.On the other hand,the continuous growth of data puts forward higher demands on the storage and retrieval efficiency of information retrieval technology.Due to the explosive growth of data and the monotonicity of storage at the back end of information retrieval technology,distributed storage and the tight integration of information retrieval and underlying storage are becoming the key to data centers to solve the problem of massive data storage and retrieval.This thesis studies and implements a massive data distributed storage system for full-text retrieval,which has the characteristics of high availability,high reliability,and high space utilization.In the process of realizing massive data storage system,we should take "data security is life" as the purpose.This article analyzes the current data redundancy strategy and compares the pros and cons of the copy form and the erasure code form.After conducting related research on these two strategies,in order to store more data in a limited space,this article adopts an erasure coding strategy.In the process of realizing a massive data storage system,the purpose of data security is to declare.This thesis puts forward the concept of EC group,stores data in the EC group in the form of erasure codes,and designs the data read and write process in detail,and also performs related processing when there are abnormal conditions during transmission.In order to improve the reliability of the data and the stability of the system,this thesis has carried out a complete functional design of the data scanning,reconstruction,migration and other modules.In terms of database,this article establishes a distributed database framework based on Raft and Rocksdb.Finally,this article implements a distributed storage system with massive data storage capabilities.This thesis analyzes the existing full-text retrieval framework and analyzes the limitations of its back-end storage.Based on Lucene,a simple full-text retrieval framework is implemented,which is connected with the back-end distributed storage to realize the massive data distributed storage system for full-text retrieval.Storage system,and the feasibility of the scheme was verified through experiments.

Keywords/Search Tags:

distributed storage, erasure coding, high reliability, full-text retrieval

PDF Full Text Request

Related items

1	Study On Coding Workflow In Erasure-coded Storage Systems
2	Research On Distributed Deduplication Storage System Based On Erasure Coding
3	Distributed Storage Framework Design For Heterogeneous Data Reliability Requirements
4	Efficient Erasure Coding in Distributed Storage System
5	A Secure Distributed Storage System Based On AONT And Erasure Coding
6	Research On Erasure Code Repair Problem In Distributed Storage System
7	Research On Index Management And File Pretreatment Of Distributed Full-text Retrieval System
8	Based On Erasure Codes Distributed Storage System Design And Implementation
9	Research On High-efficient Data Transmission Techniques In Large-Scale Distributed Erasure-Coded Storage Systems
10	Research On Fault-tolerant Optimization Strategy Based On Erasure Coding In Distributed Storag