| With the rapid development of computer technology,the human demands on the properties of computer storage is getting higher and higher.The development trend of storage systems tends to be of mass storage,low cost and high performance,while any kind of storage device cannot meet the needs of these demands mentioned above due to its intrinsic characteristics.Hybrid storage system which using the most of the properties of different types of storage devices,are not only capable to significantly expand of the storage capacity,but also able to significantly improve the performance of the storage system while keeping a low system cost.Therefore,hybrid storage is development direction of the storage technology.The paper combining with the current storage system background,complying with the development trend of large capacity,low cost,high performance for the current storage system,utilizing high-speed of SSD and high sequentiality of HDD,puts forward a new hybrid storage system named RHM.In such a hybrid storage system,considering from two point of view of system bandwidth and delay,we place hot and randomly accessed data on the fast and small SSD,and other data particularly sequentially accessed data on large hard disks.This system is implemented as a kernel module in the Linux OS and is transparent to upper layer applications.This paper is involved in below subject:First,the reading ang writing features,capacity,cost of the mainstream persistent storage devices in the current are studied and a deep analysis on the critical technology problems in hybrid storage system design is made.Second,The previous hot data identification mostly focuses on the data access frequency,but it does not dynamically reflect its change,thus caused a higher error recognition rate.In order to effectively identify the hot data,a new hot data identification scheme based on linear table counting is proposed.The scheme can catch precise recency(the degree of distance from the current time)as well as frequency,improved the accuracy of the hot data identification.Moreover,in order to evaluates the randomness of each individual file,a randomness calculation is put forward.According to file access history information the model is designed to logically merge sequential I/O requests that arrive within a short time window.After merging the requests,the total number of access segments remained for a given file is defined as the randomness count of that file.Again,a data migration model based on hot data and random data identification is presented.We uses a 0-1 knapsack model and the improved greedy algorithm to allocate or migrate files between the hard disks and the SSD.Compared with the traditional greedy algorithm,the proposed algorithm can make full use of the limited capacity of SSD.At last,three real-world workloads are used to test and compare performance of three storage systems.Experiments demonstrate that RHM improves the overall I/O throughput up to 42%and the latency up to 23%compared with the storage system using HDD only.Accordingly,the overall storage performance improved. |