| With the rapid development of cloud computing,big data,and the Internet of Things,business systems represented by image recognition and gene sequencing applications generate a large number of small files every day,posing a great challenge to distributed storage systems.Ceph is an open-source distributed storage system,but its logging mechanism,multiple replica strategies,and complex data mapping process result in low storage efficiency when storing massive small files.In addition,a large number of addressing calculations in high-concurrency read-write scenarios can also lead to performance degradation of the entire Ceph system.Firstly,to address the performance issue of low efficiency when writing massive small files in Ceph,an optimization strategy is proposed by introducing a small file middle processing layer(SFMPL).SFMPL is located between the application program and the Ceph storage cluster,which collaboratively processes small files uploaded by users through multiple modules,mainly reducing the number of small files to be stored by file deduplication and file merging,thereby reducing the disk I/O times and addressing calculation overhead of the storage system and improving the write efficiency of small files.Secondly,to improve the read performance of massive small files,a cache-based small file read strategy is proposed.This strategy improves the read efficiency of small files by introducing a small file cache,and for the insufficient LRFU cache replacement algorithm under the research background of this paper,an improved LRFU-SIZE cache replacement algorithm is proposed.The LRFU-SIZE algorithm comprehensively considers the file’s access frequency,access time,and file size,improving the cache hit rate and thus improving the access efficiency of small files.Finally,a series of performance tests were conducted to compare the proposed optimization strategy with the native Ceph system.Experimental results show that the file write time based on the SFMPL optimization strategy is reduced by more than 44% compared to Ceph’s direct write,and the small file read time based on the cache is reduced by more than 52%compared to the native Ceph system.Multiple experiments have demonstrated that the proposed optimization strategy can effectively improve the performance of accessing massive small files in Ceph. |