Font Size: a A A

Efficient Techniques Of Data Retrieval In Distributed Storage Systems

Posted on:2018-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y HongFull Text:PDF
GTID:2428330590977655Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the Age of Big Data,cloud storage systems are required to support efficient concurrent querying tasks for various growing data-intensive applications,where indices always hold important positions.The tree and hash structures are two basically widely used database index structures.However,things get tough for the deployment of traditional centralized structures on to cloud systems,since the distributed nature brings new challenges for distributed indexing design.Related works began with the popularity of peer-to-peer network(P2P).The two-layer indexing scheme was a prevalent method that time,where “local indexes” on ordinary network nodes are constructed in advance and a procedure concerning indexing nodes selection with a following publication on to other network nodes is carried on to construct the “global indexes”.RT-CAN,CG-Index,and the Caley Graph based index are three typical outcomes.Today,more data centers act as the infrastructures in various cloud systems.Their backbones,saying data center networks(DCN for short),are causing concerns in the past few years.Unlike traditional P2 P networks whose emphasis lies more on the functionality brought by the network connectivity,DCNs purely focus on the physical topologies of the cabling and routing among servers.Thanks to the topology regularity of DCNs,we hence could take a slightly new look at the existing design where the index building may not need to be totally separated from the overlay network.This research begins with the exploration of the feasibility of this combination.Our first concern is a dual-port DCN named HCN,and we propose RT-HCN,a tailored indexing scheme integrating R-Tree indexing structure for it.This scheme contains a specific index mapping technique for construction and supporting querying algorithms for application.Meanwhile,we also combines practical techniques to solve data skewness and querying false positives,greatly increasing the indexing adaptability and querying performance.RT-HCN is a successful attempt of the new distributed indexing design philosophy.In view of network topology types and basic index structures,we dig in some more.In the first place,the regularity of the scaling rule for HCN has its speciality among the server-Centric DCNs,which provides possibility of scheme expansion.Subsequently,R-Tree is adopted as the basic multi-dimensional data structure originally,bearing inherent limitations on building efficiency and node overlaps,which can be intentionally optimized on some occasions.Finally,a universal designing idea can be promoted based on the optimized index structure and topology generality.Experiments are conducted on Amazon's EC2 platforms,involving two issues: one is the performance test for RT-HCN itself including variable controlling and competitor test,while the other is the effect assessment of the potential optimization for RT-HCN.The research has important theoretical and practical significance,providing experience and guidance for distributed indexing design to a certain extent.The research results depict the scheme's application prospects in future data centers.
Keywords/Search Tags:Distributed index, R-Tree, Data center network, Server-Centric
PDF Full Text Request
Related items