Integration of Scholarly Communication Metadata using Knowledge Graphs
Integration of Scholarly Communication Metadata using Knowledge Graphs | |
---|---|
Integration of Scholarly Communication Metadata using Knowledge Graphs
| |
Bibliographical Metadata | |
Year: | 2017 |
Authors: | Afshin Sadeghi, Christoph Lange, Maria-Esther Vidal, Sören Auer |
Venue | TPDL |
Content Metadata | |
Problem: | Semantifying scholarly artifacts |
Approach: | No data available now. |
Implementation: | No data available now. |
Evaluation: | No data available now. |
Abstract
Important questions about the scientific community, e.g., what authors are the experts in a certain field, or are actively engaged in international collaborations, can be answered using publicly available datasets. However, data required to answer such questions is often scattered over multiple isolated datasets. Recently, the Knowledge Graph (KG) concept has been identified as a means for interweaving heterogeneous datasets and enhancing answer completeness and soundness. We present a pipeline for creating high quality knowledge graphs that comprise data collected from multiple isolated structured datasets. As proof of concept, we illustrate the different steps in the construction of a knowledge graph in the domain of scholarly communication metadata (SCM-KG). Particularly, we demonstrate the benefits of exploiting semantic web technology to reconcile data about authors, papers, and conferences. We conducted an experimental study on an SCM-KG that merges scientific research metadata from the DBLP bibliographic source and the Microsoft Academic Graph. The observed results provide evidence that queries are processed more effectively on top of the SCM-KG than over the isolated datasets, while execution time is not negatively affected.
Conclusion
In this paper, we presented the concept of Scholarly Communication Metadata Knowledge Graph (SCM-KG), which integrates heterogeneous, distributed schemas, data and metadata from a variety of scholarly communication data sources. As a proof-of-concept, we developed an SCM-KG pipeline to create a knowledge graph by integrating data collected from heterogeneous data sources. We showed the capability of parallelization in rule-based data mappings, and we also presented how semantic similarity measures are applied to determine the relatedness of concepts in two resources in terms of the relatedness of their RDF interlinking structure. Results of the empirical evaluation suggest that the integration approach pursued by the SCM-KG pipeline is able to effectively integrate pieces of information spread across different data sources. The experiments suggest that the rule based mapping together with semantic structure based instance matching technique implemented in the SCM-KG pipeline integrates data in a knowledge graph with high accuracy. Although our initial use case addresses the scientific metadata domain, we generated billions of triples with high accuracy in mapping and linking, and we regard it capable at an industrial scale and in use cases demanding high precision.
Future work
In the context of the OSCOSS project on Opening Scholarly Communication in the Social Sciences, the SCM-KG approach will be used for providing authors with precise and complete lists of references during the article writing process.
Approach
Positive Aspects: No data available now.
Negative Aspects: No data available now.
Limitations: No data available now.
Challenges: No data available now.
Proposes Algorithm: No data available now.
Methodology: No data available now.
Requirements: No data available now.
Limitations: No data available now.
Implementations
Download-page: No data available now.
Access API: No data available now.
Information Representation: No data available now.
Data Catalogue: {{{Catalogue}}}
Runs on OS: No data available now.
Vendor: No data available now.
Uses Framework: No data available now.
Has Documentation URL: No data available now.
Programming Language: No data available now.
Version: No data available now.
Platform: No data available now.
Toolbox: No data available now.
GUI: No
Research Problem
Subproblem of: No data available now.
RelatedProblem: No data available now.
Motivation: No data available now.
Evaluation
Experiment Setup: No data available now.
Evaluation Method : No data available now.
Hypothesis: No data available now.
Description: No data available now.
Dimensions: No data available now.
Benchmark used: No data available now.
Results: No data available now.