Font Size: a A A

Research On Performance Optimization And Its Reusability For Managing Massive Numbers Of Small Files

Posted on:2018-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:J ChengFull Text:PDF
GTID:2428330569485407Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Big Data,the amount of information and data generated by human avalanche in blowout situation,especially the small files are growing exponentially.Facing the storage and access of such massive small files,the traditional distributed file system has a series of problems,such as low efficiency of metadata structure,low efficiency of the disk I/O,low utilization of disk space and high delay of network.How to solve these problems is a big challenge.To solve the existing problems,a management strategy with high-performance and scalability for small files is designed and implemented.In order not to affect the structure of the file system itself and the processing of small files after the merger,The small file management strategy as a separate module to extract it into the file system before the provision of file merging and caching functions,and then the work is all transferred to the file system for processing,thereby improving the performance of the entire system.The file merge strategy by means of time,the strategy by means of data source as well as the strategy by means of association rule mining are adopted in the system to meet the needs of users.The trie tree algorithm based on the hash adopted replaces the LRU a traditional cache replacement algorithm,improving the cache hit rate,solving the problem of low efficiency of disk I / O and the high delay of the network.Finally,the idea of plugins is used to design and implement the management strategy of small files which is encapsulated into a plugin to realize its reusability and expansibility,so as to meet the needs of users to access different file systems.Experimental results show,the three strategies above all have their own advantages in special situations,and compared to the traditional strategy,they can all decrease the memory use by 91%,and can increase the access performance respectively by 46%,58%,64%.What is more,compared to the traditional LRU algorithm,the performance of the new cache replacement algorithm is increased by 69%.At last,the experimental results also prove that the management strategy for small file proposed can be used in different file systems,fully reflecting its reusability and expansibility.
Keywords/Search Tags:File Merge, Association Rule Mining, Trie-Tree, Reuse Technology
PDF Full Text Request
Related items