Adaptive Integration of Distributed Semantic Web Data

Adaptive Integration of Distributed Semantic Web Data
Adaptive Integration of Distributed Semantic Web Data
Bibliographical Metadata
Subject:	Querying Distributed RDF Data Sources
Year:	2010
Authors:	Steven Lynden, Isao Kojima, Akiyoshi Matono, Yusuke Tanimura
Venue	DNIS
Content Metadata
Problem:	SPARQL Query Federation
Approach:	Distributed Query Processing
Implementation:	ADERIS
Evaluation:	Performance Analysis

Abstract

The use of RDF (Resource Description Framework) data is a cornerstone of the Semantic Web. RDF data embedded in Web pages may be indexed using semantic search engines, however, RDF data is often stored in databases, accessible viaWeb Services using the SPARQL query language for RDF, which form part of the Deep Web which is not accessible using search engines. This paper addresses the problem of effectively integrating RDF data stored in separate Web-accessible databases. An approach based on distributed query processing is described, where data from multiple repositories are used to construct partitioned tables that are integrated using an adaptive query processing technique supporting join reordering, which limits any reliance on statistics and metadata about SPARQL endpoints, as such information is often inaccurate or unavailable, but is required by existing systems supporting federated SPARQL queries. The approach presented extends existing approaches in this area by allowing tables to be added to the query plan while it is executing, and shows how an approach currently used within relational query processing can be applied to distributed SPARQL query processing. The approach is evaluated using a prototype implementation and potential applications are discussed.

Conclusion

An adaptive framework has been presented for executing queries over multiple SPARQL endpoints that differs from existing approaches which use static query optimisation techniques. Many SPARQL web services are currently available and the number of them is growing. The work presented in this paper is a framework for executing queries over federations of such services. The framework proposed in this paper, which allows adaptive query processing over dynamically constructed predicate tables to be performed in conjunction with the construction of the predicate tables, was shown to perform relatively well in unpredictable environments where source query failures may occur. The prototype implemented was evaluated using real data, showing some advantage in terms of response times of adaptive over non-adaptive methods using a subset of DBPedia..

Future work

Future work will aim to investigate other data sets with different characteristics and larger data sets. As the approach presented in this paper focuses on efficiently executing a specific kind of query, that of adaptively ordering multiple joins, further work will focus on optimising other kinds of queries and implementing support for more SPARQL query language features. Future work will also concentrate on investigating how the work can be applied in various domains.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: Predicate List during setup phase

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: No data available now.

Version: No data available now.

Platform: -

Toolbox: No data available now.

GUI: Yes

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: Endpoint machines are connected to the machine on which the mediator is deployed (2GHz AMD Athlon X2, 2GB RAM) via a 100Mbs Ethernet LAN.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: Performance

Benchmark used: DBPedia

Results: No data available now.