Font Size: a A A

Design And Implementation Of Graph Data Loading Tool

Posted on:2020-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2428330599958997Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data,a large amount of data will be generated every day in our life.These data have a large scale and a wide variety of data types.It mainly includes structured data and unstructured data.Among them,graph data in unstructured data has a good practical application scenario because of its strong expressive ability and ability to deal with complex relationships.Starting from the functional and performance requirements of graph data loading,an efficient distributed graph data loading tool is designed and implemented with Spark distributed framework as the underlying technology.The graph data loading tool is mainly divided into five functional modules: reading step files,parsing multi-format data source files,loading vertex data,loading edge data and generating and loading association table data.The reading of loading step files serves the whole loading process.By parsing the loading step files in XML format,the key information of loading can be obtained,which can be transmitted to each loading step.The purpose of parsing multi-format files is to extract data from data sources,mainly including file data in CSV format or tables in relational database.Vertex data loading mainly completes a series of transformations after reading data from the data source,transforms the data into the required data structure,and then completes the full loading and incremental loading of vertex data,at the same time generates the index of vertex data,which provides data support for the loading of side data.The loading of edge data is mainly completed after reading data from data source,comparing with the index of vertex data,and finally carrying out full loading or incremental loading of edge data after a series of transformations.The generation and loading of association tables are mainly to generate association table data after a series of transformations with the edge data cleaned by the vertex data index in the graph database.Finally,these data are stored in the graph database.Finally,through the function test and performance test of the graph data loader,the experimental results show that the system basically meets the relevant requirements of the graph data loader in function and performance,and the whole loading process can be completed in a relatively high efficiency.
Keywords/Search Tags:Distributed Processing Framework, Graphic Data, Spark, Data Loading, Association table
PDF Full Text Request
Related items