| In recent years,with the development of big data,5G and artificial intelligence,a large number of knowledge map datasets based on graph database have been published one after another.Most of these datasets are organized in RDF(Resource Description Framework).In order to give full play to the value of cross-domain data set research,researchers put forward a federated distributed RDF system to centrally manage multiple RDF data sources.The unique distributed structure of the system makes it have natural data security,and it has a good application prospect in the application fields that require high data security,such as government affairs and public security.The existing researches on federated distributed RDF system mostly focus on the implementation and optimization of basic federated query,keyword query and multiquery in SPARQL 1.0.However,the query processing and optimization of complex federated query types are still insufficient.The main challenges are as follows:(1)The traditional optimization method of basic federated query based on exhaustive search in SPARQL 1.0 does not combine the characteristics of top-k query,and the generation of a large number of intermediate results will bring serious performance overhead;(2)The aggregate federated query type and property path federated query type newly added in SPARQL 1.1 have been published for a short time and have not been studied yet.In order to solve the above challenges,this thesis will focus on the processing and optimization of three typical complex queries in the federated distributed RDF system.It mainly includes the following four aspects:(1)Research on top-k query processing and optimization of federated distributed RDF systemAiming at the inefficiency of the existing federated distributed RDF system top-k query,a new federated top-k query processing and optimization method is proposed.In this method,a cost model is designed to evaluate the query cost and connection cost of subqueries after query decomposition,and provide support for generating the optimal query plan.At the same time,combined with the characteristics of top-k query,an incremental query plan execution method is proposed to further improve the efficiency of federated top-k query.Finally,the effectiveness of the method is verified by ablation experiment evaluation,and the efficiency and robustness of the method are verified by comparison with the existing federal system.(2)Research on aggregate query processing and optimization of federated distributed RDF systemAiming at the inefficiency of existing federated distributed RDF system aggregation query,a federated aggregation query processing and optimization method is proposed in combination with the characteristics of aggregation query.This method divides five typical aggregate queries into two categories: aggregate queries based on non-exhaustive retrieval(MIN and MAX)and aggregate queries based on nonexhaustive retrieval(COUNT,SUM and avg);By transfor MINg the min and MAX aggregate queries into top-1 queries,the federated top-k query processing and optimization scheme is adopted to realize aggregate queries based on non-exhaustive retrieval.In addition,combined with the aggregate query characteristics of COUNT,SUM and AVG,an aggregate query method based on exhaustive retrieval is designed and implemented.Finally,the validity of the scheme is verified and analyzed on different data sets.(3)Research on property path query processing and optimization of federated distributed RDF systemAiming at the problem that the existing federated distributed RDF system does not support the federated property path query in SPARQL 1.1,a method of processing and optimizing federated property path query based on MinDFA is proposed.In this method,the property path query problem is transformed into a MinDFA matching problem by constructing the minimal automata(MinDFA)corresponding to the property path expression.A quick matching method of MinDFA based on B-DFS is proposed,which realizes efficient federated property path query.Finally,the effectiveness of the scheme is verified and analyzed on different data sets.(4)Implementation of complex query prototype system of federated distributed RDF systemAiming at the problems of inefficient query and imperfect query types of existing federated distributed RDF system,a complex query prototype system of federated distributed RDF system is researched and implemented.The system has a friendly interactive interface,and can effectively handle SPARQL federated top-k query,federated aggregation query and federated property path query. |