| Knowledge graphs can provide important support for semantic retrieval and reasoning methods.In recent years,the scale of knowledge graph data has experienced large-scale growth.With the increasing demand for offline processing capabilities of knowledge graph data,the application side requires the storage system to have higher read-write efficiency and retrieval performance.In this regard,this thesis proposes a semi-static knowledge graph storage system optimized for batch construction,and realizes hot data partitioning and horizontal expansion at the same time;then realizes compact encoding of data through selfindexing compression technology,and further optimizes its storage scheme and data reorganization.method.Firstly,an optimized storage mode for data batch construction is proposed for semi-static knowledge graphs.By realizing an intermediate representation of graph data between the data warehouse and the graph database,the problems of complex data processing links,untimely data updates and data hot spots are alleviated.First,the attribute graph model is used to model the knowledge graph,and the one-hop subgraph is used as the basic graph storage unit;then the sharded storage of dense edges and the global ordered partition of data are realized by the strategy based on random sampling;finally,by supporting the full amount of data Build and two asynchronous data reorganization modes to achieve the update of static inventory data and the timeliness of incremental data.Secondly,a graph data storage method based on self-indexing compression technology is proposed,which aims to realize the dedicated compression and storage of graph index and attribute data.Firstly,based on rMM-tree and depth-first compact tree coding,and based on the concise data structure,DFUDS Trie is proposed as a compact representation structure of one-hop subgraph index and point-edge attribute,and the storage and indexing of the underlying bit field is realized by using compressed bit vector;and then use a compact representation-based enumeration array to achieve simplified storage of ordered attributes and one-hop subgraph bidirectional relationships.Compared with general compression methods,it can ensure the compression rate of graph data and greatly reduce the deserialization overhead on the premise of sacrificing some coding and retrieval efficiency,realize selfindexing compression of data,and improve the cold start efficiency of the system.Finally,an efficient merging method of multiple DFUDS Tries is proposed.This method is based on the characteristics of compact storage of graph data,and aims to further optimize the multi-channel data reorganization process.This method utilizes the physical continuity of subtrees encoded by DFUDS,and implements a dynamic memory swap-in and swap-out strategy for DFUDS Trie in the process of merging.It can dynamically return memory and persist nodes according to the progress of merging,which can better optimize graph data.In the reformation process,the temporary space is too large due to the merging of multiple DFUDS Tries.To sum up,this thesis first proposes a semi-static knowledge graph storage system,which optimizes the knowledge graph data structure and storage method;then further introduces selfindexing compression technology in it to realize the dedicated compression of graph data;finally,a multi-channel The merge method solves the problem of excessive memory overhead during the data update process. |