| In recent years,domestic and foreign governments begin to pay attention to the effective organization of data system,including acceleration of data integration and circulation,further demand for big data application innovation,enforcement of the top-level design of integrated big data center,and optimization on infrastructure construction layout of data center,all of which would build up a "digital network" system.Confronted with rapid growth of data volume,file storage,the most commonly used format in storage field,covers up the vast majority proportion in user market.Due to lack of scalability of traditional standalone file system,the storage bottleneck will be reached out under currently large amount of data,which makes distributed file system put into more widely use.This thesis aims to study the metadata and implementation of distributed file system.Through theory proof and industrial practice,the key technologies of metadata management are designed and implemented from three aspects:distributed consistency protocol,stand-alone data access techniques,file system metadata representation and structured storage strategies.How to improve data access efficiency under given stand-alone hardware environment is an urgent problem to be solved in the industry.By studying two different data structures,namely bloom filter and succinct range filter,this thesis optimizes their building-up and search efficiency,theoretically calculates the time and space complexity of the corresponding algorithms,and analyzes the difference between false positive rate and actual use performance,which provides a solid theoretical basis for practical application.In distributed circumstances,the log and state machine compression optimization of distributed consensus algorithm needs more detailed consideration and accurate theoretical support.Combined with the Raft consensus algorithm,some modifications and optimizations are made to the algorithm,and an innovative log compression algorithm of LSM-based state machine is implemented.The algorithm can not only ensure high fault tolerance and strong consistency of data in extreme environment such as machine downtime and network partition,but also realize efficient state machine recovery and snapshot data transmission.Distributed storage systems need to meet extremely high performance requirements.This thesis also studies several algorithms and techniques to improve the performance,and solves performance problems of file system function interface adaptation,metadata management and data I/O for the distributed file storage system,making the designed system satisfy the performance requirements of high throughput and low latency. |