SPARQL has been widely adopted since first proposed as the query language for the Semantic Web. There are many SPARQL endpoints available today, both public and private, exposing various interlinked data sources that are all part of the global RDF data cloud.
SPARQL federation offers the mechanism for integrating RDF data distributed across multiple sources. It allows data consumers to retrieve and join data from those sources via a single query in a simple and elegant way. This way, it effectively exposes the data as a single integrated RDF graph.
SPARQL federation is a dedicated SPARQL language construct defined in SPARQL 1.1 Federated Query, a W3C Recommendation that introduces the SPARQL SERVICE keyword. When using the SERVICE clause, you need to specify the SPARQL endpoint URL to retrieve the data from together with the query pattern, as demonstrated in the example below.
The following example shows how to query a local RDF graph combined with the data from a remote SPARQL endpoint.
Suppose that the local RDF graph contains only one triple:
while the remote RDF dataset available at the http://people.example.org/sparql endpoint contains the following data:
Locally, we know that http://example.org/Anton knows http://example.org/Alice but in order to get the name of http://example.org/Alice, we need to query the remote http://people.example.org/sparql endpoint. To retrieve the names of all people that http://example.org/Anton knows, a single federated query can be used:
This query retrieves the local data joined with the response from the remote SPARQL endpoint, and returns the following:
Under the hood, the SERVICE keyword makes the SPARQL query issue a query on another SPARQL endpoint during its execution. The databases and services that support SPARQL 1.1 Federated Query and the SERVICE keyword include, but are not limited to, the following:
As part of executing the federated query, the query processor calls the external SPARQL endpoint. This comes with a number of potential issues that need to be addressed.
The SERVICE keyword makes the federated query processor invoke a portion of a SPARQL query against a remote SPARQL endpoint over HTTP. HTTP communication overheads make those queries slower which adds to the execution time of the whole query run by the federated query processor.
If the remote SPARQL service is unavailable, returns an error, or cannot be accessed for other reasons, the federated query execution will fail as a whole.
It may be desirable to ignore the remote service errors, in which case the query does not fail as a whole but the SERVICE pattern is ignored. This can be achieved by using the SERVICE clause with the SILENT keyword, as in the following query:
This query will ignore all errors encountered while accessing the remote http://people.example.org/sparql SPARQL endpoint.
When query processors execute federated queries, the external endpoint URIs are dereferenced and the SERVICE queries and parameters are passed to those external SPARQL query processors.
SPARQL federation does not support authentication, and when your use case involves issuing federated queries distributed over multiple private SPARQL endpoints, it is your responsibility to secure the network and make sure that your remote SPARQL services are only accessible from within that network.
The external SPARQL endpoints, together with the data received and incorporated into the query output, all need to be verified. Therefore, you have to make sure that they satisfy your data processing and licensing requirements.
SPARQL 1.1 Federated Query allows you to distribute your RDF data across multiple databases and use a single query to access it across all those database instances. The data does not need to be colocated or made publicly accessible.
SPARQL federation effectively enriches your working datasets with external public or private data. It is a mechanism for querying and retrieving the data from the joined global RDF graph.