Various industries have developed many distributed file systems,such as HDFS,GlusterFS,Haystack,and CEPH,which are suitable for their own fields.After investigation and analysis,we found that these filesystems are designed for large file storage.Once involved in large-scale small file IO,these distributed filesystem's performance is poor,or even unable to work.Therefore,this paper starts from the storage format and fault-tolerant mechanism,and optimizes the IO performance of small files in distributed storage system.The existing filesystem does not support small files well enough.In response to this shortage,this paper describes the storage format and mechanism of small files in data storage server.We use big data block file to store small file and each data block contain a large number of small file.We use the index file to identify each file in the data block.Experimental results show that the optimized file storage format has good IO performance.In traditional file system,each file corresponds to a metadata information.However,once a large-scaled small file access is involved,this approach is extremely limited.Optimized small file storage format can greatly reduce the number of metadata,thus providing the possibility for the cache metadata.In this paper,a cache mechanism is introduced in the distributed file system to reduce the file access delay by storing the index file in the cache.This paper introduces the use of erasure code in small file storage.According to storage mechanism introduced in Chapter 2,the extended block occupies a small space.When data is restored,it is no longer necessary to read other files to decode and restore the damaged files.This greatly reduces the network overhead and computational overhead. |