With the rapid development of the Internet and big data,there is a constant stream of all kinds of data being generated all the time,and the geometric growth of data has brought more challenges to the storage of data centers.On the other hand,the continuous growth of data puts forward higher demands on the storage and retrieval efficiency of information retrieval technology.Due to the explosive growth of data and the monotonicity of storage at the back end of information retrieval technology,distributed storage and the tight integration of information retrieval and underlying storage are becoming the key to data centers to solve the problem of massive data storage and retrieval.This thesis studies and implements a massive data distributed storage system for full-text retrieval,which has the characteristics of high availability,high reliability,and high space utilization.In the process of realizing massive data storage system,we should take "data security is life" as the purpose.This article analyzes the current data redundancy strategy and compares the pros and cons of the copy form and the erasure code form.After conducting related research on these two strategies,in order to store more data in a limited space,this article adopts an erasure coding strategy.In the process of realizing a massive data storage system,the purpose of data security is to declare.This thesis puts forward the concept of EC group,stores data in the EC group in the form of erasure codes,and designs the data read and write process in detail,and also performs related processing when there are abnormal conditions during transmission.In order to improve the reliability of the data and the stability of the system,this thesis has carried out a complete functional design of the data scanning,reconstruction,migration and other modules.In terms of database,this article establishes a distributed database framework based on Raft and Rocksdb.Finally,this article implements a distributed storage system with massive data storage capabilities.This thesis analyzes the existing full-text retrieval framework and analyzes the limitations of its back-end storage.Based on Lucene,a simple full-text retrieval framework is implemented,which is connected with the back-end distributed storage to realize the massive data distributed storage system for full-text retrieval.Storage system,and the feasibility of the scheme was verified through experiments. |