| With the development of information technology,geographic information technology plays an important role in our daily life,such as geographic navigation systems,the positioning of hotels,restaurants and entertainment venues,etc.The basis of these applications are geographic information systems,and the construction of geographic information systems relies on remote sensing image data.Remote sensing image data is different from ordinary small files in that it has unique geographic coordinates.Most existing storage systems are designed for large files storage or general small objects.We can’t establish indexes for remote sensing image data with geographic coordinates,which greatly affects the access speed of remote sensing image data.This thesis designs a distributed storage system for remote sensing images.The research contents are as follows:(1)Aming at the demand of reading and writing remote sensing images,this thesis establishs index for geographic coordinates.This thesis refers to the existing remote sensing image encoding technology S2 algorithm,and designs a metadata index on basis of it,which merges and stores data with similar geographical locations.(2)Aming at the demand of storaging metadata,this thesis designes a storage engine with K/V separation to meet the reading and writing requirements of metadata.Aming at the problem of read and write amplification of large values stored in K/V storage engine,this thesis refers to the existing LSM-Tree storage engine and redesign the K/V separation architecture.This thesis designs the garbage collection mechanism to reduce the impact on the foreground user’s reading and writing traffic duiring background garbage collection.(3)For the demand of massive data access,the cache node is designed to avoid the unnecessary reading and writing operations on the underlying storage media for each read and write requests.This thesis uses 2Q algorithm to identify the system hot data.(4)Aming at solving the task scheduling problem in the system,the same task priority matching principle and node first matching principle are used for task scheduling.we expect the reading and writing tasks of the same data area are assigned to the same cache node as much as possible.By doing this we can effectively utilizing the cache data.(5)Aming at solving the single point problem of the distributed storage system,we design a primary-standby architecture.The primary node performs task management and scheduling.This thesis uses shared storage and synchronous replication technology to complete the data synchronization of the prmary node and standby node.This thesis finally performs the function test and performance test.Compared with the ceph storage system in reading and writing remote sensing image data,our system has a greater performance advantage in reading and writing small file data of remote sensing images. |