Font Size: a A A

Research On Organizational Strategy And Key Technologies Of Modular Tree-connetced Disk Arrays

Posted on:2011-07-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z K WangFull Text:PDF
GTID:1118330338988113Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the explosion growth of digital information, the demands of storage system scalability, reliability and availability have the higher requirements. RAID (Redundant Array of Independent Disk) is the key device in the storage system. Since RAID has the merits of higher reliability and parallism, it receives the academic and industrial atten-tions since its conception. However, existing centralized control RAID architecture easily generate performance bottleneck and scalability problem. High-end enterprise RAID systems are consisted of monolithic systems with customized hardware, multiple redun-dant components and paths to guarantee performance and reliability. But these methods are expensive and difficult to upgrade. Therefore, searching for a novel organizational method for large scale RAID with high performance, low cost, high reliability and easy to expand has become an urgent issue for researchers.Building modular storage system using standard storage units is the trend of large scale storage systems. This method offers several advantages over monolithic systems such as high performance-price ratio and increased scalability. But existing researches of modular storage system ignore the organization of storage units. Referring to the topol-ogy research of high performance computing and considering the characteristics of stor-age I/O, this paper proposes MT2RAID—a modular tree-connected multi-tier RAID ar-chitecture for very large disk arrays. MT2RAID is built from a collection of commodity components containing CPU, RAM, Disks and Interconnection interfaces. Storage units are connected through fat-tree based interconnection channels.Each storage unit in MT2RAID is an independent storage subsystem, which only provides moderate reliability and performance. In order to achieve different performance and reliability requirements, this paper designs and implements three possible data layout. MT2RAID-S will distribute data on all storage nodes without redundancy. MT2RAID-M stores data in mirror storage units. MT2RAID-P uses the parity among the storage units. Considering the hierarchy of MT2RAID, write Logging and data caching is adopted to further improve system performance. Compared with centralized RAID architecture, MT2RAID has the merits of high performance-price ratio and increased scalability. However, the resource of each storage unit in MT2RAID is limited. In order to improve MT2RAID performance at most, this paper proposes a global unified cache management scheme, which can fully utilize the free memory space in other storage units and reduce disks I/Os. Meanwhile, global cache space can rise with the increase of the storage unit number in MT2RAID.With the increasing demands of applications, MT2RAID should provide flexible storage services. MT2RAID manages the storage space as a virtual storage pool and per-forms dynamic space allocation according to the demand of the applications. This man-ner can improve the utilization of the storage space of each storage unit and avoid the single storage unit to be the performance bottleneck.Data protection techniques such as RAID, data replication and remote mirroring can protect data against disk or site failures. But these methods can not sustain for the data loss caused by software defects, virus attacks and user errors etc. Considering the hier-archical architecture of MT2RAID, this paper uses the combination of Backup, Snapshot and Continuous Data Protection to provide multi-tier data recovery methods. For the write performance penalty of existing Copy-On-Write snapshot methods and Continuous Data protection method, we propose multi-thread parallel execution and log disk optimi-zations. Analysis and experiment results show that our optimizations have substantially improved the write performance.Energy saving is the trend of the current data center. As a large scale storage archi-tecture, power consumption is also a concerning problem for MT2RAID. While existing energy saving methods based on the disk layout mainly care about the hit ratio and ignore the miss effect on the disk, it can only achieve limited energy saving effect. Through the analysis of the workload, this paper proposes a ClusteRing-based Energy-Efficient scheme for Disk arrays (CREED). By concentrates popular and correlated data in few active disks, CREED can make the inactive disks get longer idle time. Another advantage of CREED is that it can effectively reduce the disk arm movement and rotation latency, thereby simultaneously reducing energy consumption and data access latency.
Keywords/Search Tags:RAID, Scalability, Modularity, Global cache management, Storage space virtualization, Data recoverability, Energy saving
PDF Full Text Request
Related items