Font Size: a A A

Design And Implementation Of Distributed Graph Database Operators

Posted on:2024-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2568307079476414Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid increase of related data generated by the Internet,there is a demand for the storage and processing of relational networks.Although the traditional relational database can realize the relationship through the join operator,the relational database is no longer suitable to deal with the relational network when the associated data is very large.However,graph database,with its unique topological storage,can deal with this kind of associative query efficiently,so graph database is a popular research direction in recent years.In this thesis,the design and implementation of distributed graph database operator is the research objective.Major contributions and innovations are as follows:1.Design and implement the basic operator of graph database and give correctness test.Including filter,sort,join and graph traversal operators.2.Vector optimization was carried out based on vector instruction set,and its speed was quantified by experiments.In the sorting operator,a hybrid sorting algorithm framework based on cache awareness and vector instruction set is proposed.The algorithm is divided into three stages: in-register sorting,in-cache merging and out-of-cache merging.Different sorting algorithms are used at each stage.Experiments show that the performance of this algorithm is more than 3 times that of std::sort.In the join operator,a novel algorithm based on vector instruction set Bloom filter is presented,whose performance is 3.5 times that of scalar algorithm.In the graph traversal operator,a breadth-first traversal algorithm based on vector instruction set optimization is presented,and its performance is much improved compared with the scalar implementation.3.The design scheme of operator based on data partition and multi-threading is given.In the traditional scheme,the task is evenly divided according to the number of threads supported by the hardware,but because the operator task has different data distribution,the execution time of each subtask is also different.In this paper,according to the characteristics of operator tasks,a task segmentation and parallel execution framework based on operator characteristics is proposed to maximize the parallel execution of multiple cores.Experiments show that the performance of this scheme is 1.1 times that of the traditional scheme.4.A distributed design scheme for sorting and joining operators is presented.Among distributed sorting operators,a distributed sorting framework based on multiple scheduling nodes is presented innovatively.In distributed join operator,the vector optimization scheme of 2-phase track join is discussed.This thesis compares the algorithm implementations optimized by instruction-level pipeline parallelism and vector instruction set based on the lab’s self-developed distributed graph database,using the basic scalar algorithm implementation of each operator as a benchmark,all of which have achieved good performance.
Keywords/Search Tags:distributed graph database, operator, vector instruction set, parallelism
PDF Full Text Request
Related items