Discovering and Maintaining Links on the Web of Data

From Openresearch
Jump to navigation Jump to search
Discovering and Maintaining Links on the Web of Data
Discovering and Maintaining Links on the Web of Data
Bibliographical Metadata
Subject: Link Discovery
Keywords: Linked data, web of data, link discovery, link maintenance, record linkage, duplicate detection
Year: 2009
Authors: Julius Volz, Christian Bizer, Martin Gaedke, Georgi Kobilarov
Venue ISWC
Content Metadata
Problem: Link Discovery
Approach: No data available now.
Implementation: Silk–Linking
Evaluation: No data available now.

Abstract

The Web of Data is built upon two simple ideas, Employ the RDF data model to publish structured data on the Web and to create explicit data links between entities within different data sources. This paper presents the Silk -- Linking Framework, a toolkit for discovering and maintaining data links between Web data sources. Silk consists of three components: 1. A link discovery engine, which computes links between data sources based on a declarative specification of the conditions that entities must fulfil in order to be interlinked; 2. A tool for evaluating the generated data links in order to fine-tune the linking specification; 3. A protocol for maintaining data links between continuously changing data sources. The protocol allows data sources to exchange both linksets as well as detailed change information and enables continuous link recomputation. The interplay of all the components is demonstrated within a life science use case.

Conclusion

We presented the Silk framework, a flexible tool for discovering links between entities within different web data sources. The Silk-LSL link specification language was introduced and its applicability was demonstrated within a life science use case. We then proposed the WOD-LMP protocol for synchronizing and maintaining links between continuously changing Linked Data sources.

Future work

Future work on Silk will focus on the following areas: We will implement further similarity metrics to support a broader range of linking use cases. To assist users in writing Silk-LSL specifications, machine learning techniques could be employed to adjust weightings or optimize the structure of the matching specification. Finally, we will evaluate the suitability of Silk for detecting duplicate entities within local datasets instead of using it to discover links between disparate RDF data sources. The value of the Web of Data rises and falls with the amount and the quality of links between data sources. We hope that Silk and other similar tools will help to strengthen the linkage between data sources and therefore contribute to the overall utility of the network.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: http://silk.googlecode.com

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: {{{Catalogue}}}

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: http://www4.wiwiss.fu-berlin.de/bizer/silk/

Programming Language: Python

Version: No data available now.

Platform: No data available now.

Toolbox: No data available now.

GUI: Yes

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : A methodology that proved useful for optimizing link specifications is to manually create a small reference linkset and then optimize the Silk linking specification to produce these reference links, before Silk is run against the complete target data source.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: {{{Dimensions}}}

Benchmark used: DBpedia, DrugBank

Results: No data available now.