Font Size: a A A

Design And Implementation Of Spark-based RDF Streaming Data Real-time Query System

Posted on:2023-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:J T JiaoFull Text:PDF
GTID:2568306815991369Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Industrial Internet closely integrates industry with a new generation of information technology to achieve the goal of interconnecting everything in the industrial field.The Internet of Things is an important part of the industrial Internet network.It can realize the perception and data integration of information at the edge of the network,and realize the value mining of massive data based on big data analysis technology,thereby supporting intelligent applications in different scenarios.Semanticization is an advanced form of data application.With the development of semantic technology,semantic Io T and other related technologies have become a hot topic and focus of research,and have received extensive attention.Stream is a typical form of carrying Io T data,combining semantic technology and stream processing technology.It is an important idea and an effective way to realize the data interoperability and intelligent application of the Internet of Things,and has important scientific research value.Aiming at the acquisition and real-time query of semantic information in massive data,this paper focuses on the query method of RDF data used to express semantic information,which is mainly divided into two query schemes.One is the forward query scheme.The reasoning query using the RSP(RDF Streaming Processing)engine of this scheme is carried out from the set knowledge base.The advantage is that it can meet the real-time requirements.The limitation of the knowledge base can not meet the query requirements;one is the backward query scheme,the RSP engine using this scheme is to extend the SPARQL(W3C standard RDF query language)for real-time query,the advantage is that it can handle complex semantic logic,the disadvantage The query time complexity is high,and the real-time requirements are difficult to meet.Combined with the characteristics of the above two query methods of RDF stream data,this paper designs and implements a Spark-based RDF stream data real-time query system-RSP-FB system.The core part of the system is two modules.One is the design of the forward query module.The first is to design the OH-CQ method.The design process of this method is to use the method of conjunctive query to realize the transformation of the OWL axiom and the OWL Horst rule set through the SPARQL statement,so as to realize the OWL Horst rule.Then,the characteristics of RDFS and OWL Horst rule sets are analyzed,and finally the query optimization is realized by reducing the number of iterations of the reasoning process.The other is the design of the backward query module.The prime minister selects C-SPARQL as the query language,then obtains the syntax tree and the corresponding logical plan through the statement parsing layer,and optimizes the generated logical plan using the graph path optimization algorithm to generate the final The query plan is finally distributed to different nodes of the Spark cluster for execution.At the same time,in order to verify the overall performance of the RSP-FB system designed in this paper,the LUBM dataset and the SRBench dataset are used to conduct experiments and evaluations on the forward query and backward query modules,respectively.The results show that the system has the ability to query RDF streaming data Good scalability and real-time query capabilities.
Keywords/Search Tags:RSP-FB System, Real-time Query, RDF Streaming, OH-CQ Method, SPARQL
PDF Full Text Request
Related items