Font Size: a A A

Research And Application In Storage System Of The Massive Medium And Small File Based On MongoDB

Posted on:2017-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:2308330485992056Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet technology and the growing popularity of social-networking, heterogeneous Internet data is increasing. The storage optimization of massive small files has become an important research direction in the current massive data storage technology. For HDFS, TFS distributed file system, etc,it does not have generality in dealing with the massive small files. However, with the gradual maturity of the NoSQL technology, It has the advantages of distributed system, the characteristics of simple and flexible, which also make it a possible solution to the massive small file storage.The Meridian Project Data Center is responsible for processing of space science data generated by the detection equipments from all over the country. With the increasing space exploration data, By the end of 2015, the cumulative collection scientific data of the Meridian Project Data Center: 9.8018 million. The total file size is about 3.45 TB. And 90 percent of files are small files whose size is under 100 k or less, the rest is a few large files. Presently, the Meridian Project stores scientific data by the traditional distributed file system. When dealing with many small files, it will lead to the high disk I/O, and make data backup time too long, and make data storage effiency low.Based on the feature of Meridian Project data file, a full analysis of the advantages and disadvantages of the today’s mainstream massive data storage solutions, On the basis of the Mongo DB, this paper proposes a space science data storage model- ZW-Mongo. The storage model consists of three aspects of the design:(1) using BSON data structure characteristic of the MongoDB, dealing directly with small file storage, and improving the efficiency of small files storage;(2) It is divided into blocks for storing large files, and builds meta information collection and block data collection;(3) Using historical versions and the way to soft delete, improve file utilization. ZW-Mongo storage model improves the efficiency of the small files stored and accessed, reducing the costs of the file management effectively.By analyzing the shortcomings of MongoDB data balancing strategy, this paper proposes a data balancing strategy based on consistent hashing, and constructs file storage procedure based on consistent hashing.Based on the ZW-Mongo storage model, this paper designed and developed a set of REST style data access interface, and at the same time implements the access interface of the data balancing algorithm, which is easy to add and remove data nodes. The comparative tests between the ZW-Mongo’s data interface and the traditional distributed file system show that ZW- Mongo storage model is superior to the traditional storage model in such aspects as data read, query, backup, both is similar in terms of data written. At the same time the test of the data balancing by adding virtual nodes shows that adding virtual nodes can promote a balanced distribution between data nodes. ZW-Mongo storage model has been practical applied to the data storage system of the Meridian Project Data Center, and has a good result.
Keywords/Search Tags:Massive Medium and Small File, Storage Model, Data Interface, Balancing Algorithms
PDF Full Text Request
Related items