| Flash-based Solid State Disk (SSD) is emerging as a promising nonvolatile storage device, which is composed by multiple flash chips. Compared with traditional Hard Driver Disk (HDD), SSD provides high-performance, low-energy consumption, high-reliability and so on. At present, SSDs have been widely employed in modern computing systems from low-end personal computers, medium-end servers to high-end high-performance supercomputers. There are two key and unique flash characteristics, namely, write-after-erase and erase cycle. A write operation can only change the value of each target bit from one to zero. Once a page is written, it must be erased, which means all bits are reset to one, before the next write operation can be performed on the same page. Each flash block has an upper limit of erase cycles before it is worn out. After wearing out, a block can no longer store any data. A typical MLC Flash has an erase-cycle limit of about 10K, while a typical SLC Flash has an erase-cycle limit of about 100K. For these intrinsic flash characteristics, some unique SSD hardware architectures and software systems are presented.There are four levels of parallelism inside SSDs, including channel-level parallelism, chip-level parallelism, die-level parallelism and plane-level parallelism. Utilizing these multi-level parallelism is the key point to improve the performance of SSD. In fact, several factors will impact the effectiveness of parallelism inside SSD, including flash advanced commands and allocation schemes. Flash advanced commands are provided by flash manufacturer for executing efficient read/write/erase operations. For example, multi-plane command utilizes plane-level parallelism by executing multiple read/write/erase operations concurrently in multiple planes; interleave command utilizes die-level parallelism by executing read/write/erase operations with pipelining style in several dies. There are several kinds of allocation schemes in SSD. Allocation schemes employ channel-level and chip-level parallelism. In this paper, I research the relationship between several levels parallelism and advanced commands, allocation schemes, determine the priority order of these levels that optimizes the performance and endurance of SSD. My experimental results show that the optimal priority order of parallelisms in SSD should be (1) the channel-level parallelism, (2) the die-level parallelism, (3) the plane-level parallelism and (4) the chip-level parallelism.Flash Translation Layer (FTL) is one of the most important components of SSD, whose main purpose is to perform translation from logical address to physical address adapting to the unqiue physical characteristics of flash memeory technology. Two novel FTL algorithms have been presented in this paper, namely three-level page-mapping FTL scheme and hiding address translation FTL scheme. The former utilizes the characteristics of SSD hardware system, divides a plane into several parts called block-group. A block-group has a fixed number of physical blocks. In this scheme, a series of logical pages are stored in a block-group. Inside the block-group, the mapping relationship between logical page and physical page is fully associative. This scheme decreases the capacity of mapping table significantly and provides excellent performance as smoothly as pure page-mapping scheme. The latter achieves the pure page-mapping FTL performance at the RAM cost of a block-mapping FTL while consuming lower energy, by hiding the address translation. The basic idea of this scheme is to create a separate access path to read/write the address mapping information to significantly hide the address-translation latency by incorporating a low energy-consuming solid-state memory device that stores the entire page mapping table.The buffer-cache of an SSD plays an essential role in bridging the speed gap between flash storage media and the host interface. While the existing SSD buffer management schemes are designed to improve SSD performance, they are often ineffective when serving the widespread bursts of I/O traffic in data-intensive workloads. To address this problem, this paper proposes a Proactive and Adaptive SSD buffer Scheme (PASS) to judiciously and actively flush dirty data in anticipation of traffic bursts by exploiting the light-traffic intervals, as well as the chip-level and channel-level parallelism inside SSD. The experimental results show that PASS significantly and consistently outperforms the state-of-the-art buffer schemes in both response time and endurance measures.SSDsim is event-driven, modularly structure, multi-tiered and open-source SSD simulator. It is capable of simulating most SSD hardware platforms, mainstream FTL schemes, allocation schemes, buffer management algorithms and request scheduling algorithms. The three tiered SSDsim design consists of the buffer module at the top, the FTL and allocation module in the middle, and the low-level hardware platform module at the bottom. By feeding block-level trace files and configuring the parameter files, the waiting time, processing time, response time of each request, total erasure count, buffer hit count and other detailed information can be obtained. To validate the accuracy of SSDsim, a real SSD hardware prototype has been implemented. The average response time obtained from SSDsim is very close to that obtained from the prototype, which indicates the high accuracy of SSDsim. |