Design And Implementation Of Graph Data Loading Tool

Posted on:2020-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:W J Li

Full Text:PDF

GTID:2428330599958997

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the arrival of the era of big data,a large amount of data will be generated every day in our life.These data have a large scale and a wide variety of data types.It mainly includes structured data and unstructured data.Among them,graph data in unstructured data has a good practical application scenario because of its strong expressive ability and ability to deal with complex relationships.Starting from the functional and performance requirements of graph data loading,an efficient distributed graph data loading tool is designed and implemented with Spark distributed framework as the underlying technology.The graph data loading tool is mainly divided into five functional modules: reading step files,parsing multi-format data source files,loading vertex data,loading edge data and generating and loading association table data.The reading of loading step files serves the whole loading process.By parsing the loading step files in XML format,the key information of loading can be obtained,which can be transmitted to each loading step.The purpose of parsing multi-format files is to extract data from data sources,mainly including file data in CSV format or tables in relational database.Vertex data loading mainly completes a series of transformations after reading data from the data source,transforms the data into the required data structure,and then completes the full loading and incremental loading of vertex data,at the same time generates the index of vertex data,which provides data support for the loading of side data.The loading of edge data is mainly completed after reading data from data source,comparing with the index of vertex data,and finally carrying out full loading or incremental loading of edge data after a series of transformations.The generation and loading of association tables are mainly to generate association table data after a series of transformations with the edge data cleaned by the vertex data index in the graph database.Finally,these data are stored in the graph database.Finally,through the function test and performance test of the graph data loader,the experimental results show that the system basically meets the relevant requirements of the graph data loader in function and performance,and the whole loading process can be completed in a relatively high efficiency.

Keywords/Search Tags:

Distributed Processing Framework, Graphic Data, Spark, Data Loading, Association table

PDF Full Text Request

Related items

1	Research On Dynamic Adaptive Data Table Association Application Technology Based On Spark Framework
2	Research On The In-Memory Data Management Technology On Spark Data Processing Framework
3	The Research Of Distributed RDF Data Processing Architecture
4	Design And Realization Of IP Actvity Table Based On A Distributed Infrastructure
5	Research On Resource Dynamic Allocation Technology On Spark Data Processing Framework
6	Research On Fast Data Cube Computation Method Based On Spark Platform
7	Research On Association Mining Optimization Based On Spark Distributed And Application Of Comprehensive Decision
8	Distributed Association Rules Algorithm Based On The Spark
9	Research And Application Of Distributed ETL Based On Spark
10	Design And Implementation Of Forum Data Analysis Platform Based On SPARK