Important questions about the scientific c … Important questions about the scientific community, e.g., what authors</br>are the experts in a certain field, or are actively engaged in international collaborations, can be answered using publicly available datasets. However, data required to answer such questions is often scattered over multiple isolated datasets.</br>Recently, the Knowledge Graph (KG) concept has been identified as a means</br>for interweaving heterogeneous datasets and enhancing answer completeness and</br>soundness. We present a pipeline for creating high quality knowledge graphs that</br>comprise data collected from multiple isolated structured datasets. As proof of</br>concept, we illustrate the different steps in the construction of a knowledge graph</br>in the domain of scholarly communication metadata (SCM-KG). Particularly, we</br>demonstrate the benefits of exploiting semantic web technology to reconcile data</br>about authors, papers, and conferences. We conducted an experimental study on</br>an SCM-KG that merges scientific research metadata from the DBLP bibliographic source and the Microsoft Academic Graph. The observed results provide</br>evidence that queries are processed more effectively on top of the SCM-KG than</br>over the isolated datasets, while execution time is not negatively affected.execution time is not negatively affected. +
In this paper, we presented the concept of … In this paper, we presented the concept of Scholarly Communication Metadata Knowledge Graph (SCM-KG), which integrates heterogeneous, distributed schemas, data and</br>metadata from a variety of scholarly communication data sources. As a proof-of-concept,</br>we developed an SCM-KG pipeline to create a knowledge graph by integrating data collected from heterogeneous data sources. We showed the capability of parallelization in</br>rule-based data mappings, and we also presented how semantic similarity measures are</br>applied to determine the relatedness of concepts in two resources in terms of the relatedness of their RDF interlinking structure. Results of the empirical evaluation suggest</br>that the integration approach pursued by the SCM-KG pipeline is able to effectively</br>integrate pieces of information spread across different data sources. The experiments</br>suggest that the rule based mapping together with semantic structure based instance</br>matching technique implemented in the SCM-KG pipeline integrates data in a knowledge graph with high accuracy. Although our initial use case addresses the scientific</br>metadata domain, we generated billions of triples with high accuracy in mapping and</br>linking, and we regard it capable at an industrial scale and in use cases demanding high</br>precision.and in use cases demanding high
precision. +