| In recent years,the processing and storage of massive data has been a popular area of research and exploration in both academia and industry.Infrastructures for data processing and storage have been greatly developed and derived,providing targeted and optimized solutions for various scenarios.Object storage services on public clouds are currently the most convenient way to achieve large-scale unstructured data storage,but the demand for sensitive data storage and other scenarios has driven the rise of private/hybrid clouds,and has also prompted the need for self-hosted object storage systems.The thesis designs and implements an object storage system meeting the needs of information technology enterprises for reliable and scalable unstructured data storage.The system provides a simple and abstracted interface for applications to simplify the process of data accessing.The design of the interface is compatible with the API of Amazon S3,a public object storage service that is widely used nowadays,which provides good compatibility and mitigates the overhead of access for applications.The system has a three-tier logical model of user space,storage area and objects,and the flat structure of the model reduces the complexity of the system.In terms of the architecture design,the system follows the principle of logic-storage separation,separating the processing of object storage logic,object metadata storage,and object data storage from each other,reducing the coupling within the system and improving the flexibility of the architecture.A hashing based approach is used to identify object data,establish the mapping from object to object data,and provide a data integrity checking mechanism,which ensures the reliability of storage.The thesis also provides a file system based storage implementation for object data,which achieves scalability through partitioning,and availability through replication.The system designed and implemented in the thesis achieves scalability and reliability through a shared-nothing architecture,and can be deployed on private/hybrid clouds,which not only reduces the cost but also satisfies needs such as autonomous control of sensitive data and low latency scenarios,and can serve as an object storage solution for enterprises in a self-hosted cloud environment. |