To get EDIT-rights for the wiki, please create an account and contact Alice Carpentier (alice.carpentier@sti2.at) with your name, affiliation and username to be approved!

PlanetData Lab

From PlanetData - WIKI
Jump to: navigation, search

Title: PlanetData Lab

Number: WP05

Activity: Activity A2 Data provisioning and management

Activity leader:
   
Activity A2 Data provisioning and management Karl Aberer

Lead partner: EPFL

Work package leader: Nguyen Quoc Viet Hung

Overview

PlanetData lab

We maintain a collection of tools that support large-scale data management, with particular attention to Linked Data and sensor data. The tools listed in this catalogue were developed by PlanetData partners, who also offer support for using the tools to the PlanetData community. The listed tools are useful in different ways. They include data management tools (RDF repositories, relational databases, sensor data management tools), RDF data provisioning tools (that enable RDF access to legacy data, or to data stored in other formats than RDF), linked data tools (e.g. to crawl linked data, discover relations or manage SPARQL endpoints), large-scale reasoning frameworks, RDF vocabulary lookup for RDF developers, and tools that enable ontology-based access to stream data.


Technologies

  Name Contact person Lead partner Category
Corpex Corpex — Corpora Explorer Denny Vrandecic KIT Tool
Produce
DataManagement
CumulusRDF CumulusRDF Andreas Harth KIT Tool
Publish
D2R D2R Christian Bizer FUB Tool
Publish
Provisioning
DBpedia Spotlight DBpedia Spotlight Pablomendes FUB Tool
Produce
Consume
Diversity extensions for Drupal Diversity extensions for Drupal Ioan Toma UIBK Tool
Produce
Consume
Provisioning
DataManagement
ELLY ELLY Ioan Toma UIBK Tool
Produce
Provisioning
Enrycher Enrycher Delia Rusu JSI Tool
Produce
Consume
GSN GSN (Global Sensor Networks) Hoyoung Jeung EPFL Tool
Produce
Publish
Consume
Geometry2RDF Geometry2RDF Boris UPM Tool
Produce
Provisioning
IRIS IRIS - Integrated Rule Inference System Ioan Toma UIBK Tool
Produce
Publish
Provision
LDIF LDIF Christian Bizer FUB Tool
Produce
Consume
LDSpider LDSpider (Linked Data Spider) Andreas Harth KIT Tool
Consume
LOD Catalog Entry Validator LOD Cloud Catalog Entry Validator Pablomendes FUB Tool
Publish
LarKC LarKC (Large Knowledge Collider) Iker Larizgoitia UIBK Tool
Produce
Consume
MonetDB MonetDB Ying Zhang CWI Tool
DataManagement
Consume
NOR2O NOR2O Boris UPM Tool
Produce
Provisioning
ODEMapster R2O&ODEMapster Freddy Priyatna UPM Tool
Produce
OKKAM OKKAM Zoltan Miklos EPFL Tool
Produce
Publish
Consume
Ontology Mapping Tool Ontology Mapping Tool(OMT) Srdjan Komazec UIBK Tool
Produce
Consume
Pubby Pubby Christian Bizer FUB Tool
Publish
R2R R2R Christian Bizer FUB Tool
Produce
Consume
RDF-AI RDF-AI Srdjan Komazec UIBK Tool
Produce
Consume
S2O S2O Oscar Corcho UPM Tool
Produce
Publish
Provisioning
SEALS Results Repository Service SEALS (Semantic Evaluation At Large Scale) Ioan Toma UIBK Tool
DataManagement
Provisioning
SEALS Test Data Repository Service SEALS (Semantic Evaluation At Large Scale) Ioan Toma UIBK Tool
DataManagement
Provisioning
SEALS Tools Repository Service SEALS (Semantic Evaluation At Large Scale) Ioan Toma UIBK Tool
DataManagement
Provisioning
SESA SESA (Semantic Enabled Service Architecture) Alex Oberhauser UIBK Tool
Produce
Consume
SKOS2OWL SKOS2OWL Martin Hepp UIBK Tool
Produce
Consume
Shortipedia Shortipedia: Aggregating and Curating Semantic Web Data Denny Vrandecic KIT Tool
Produce
Public
DataManagement
Silk Silk Christian Bizer FUB Tool
Produce
Consume
Tsc++ tsc++ (Triple Space Computing) Michael Fried UIBK Tool
Produce
Consume
WSMT Web Service Modeling Toolkit (WSMT) Srdjan Komazec UIBK Tool
Produce
Consume

Best Practice Documents

  Name Short description Contact person Lead partner Category
Benchmark for Model-Based Approaches to Sensor Data Compression Benchmark for Model-Based Approaches to Sensor Data Compression This work presents a benchmark that offers a comprehensive empirical study on the performance comparison of the model-based compression techniques. Nguyen Quoc Viet Hung EPFL BestPractice
Catalogue of Pitfalls Catalogue of Ontology Engineering Pitfalls Provides a catalogue of common worst practices in ontology engineering that aims at helping users to avoid the appearance of pitfalls in their models. Oscar Corcho UPM BestPractice
State of the LOD cloud State of the LOD Cloud Provides statistics about the structure and content of the LOD cloud and analyzes the extend to which LOD data sources implement nine best practices. Pablomendes FUB BestPractice

Storage Tools

GSN (Global Sensor Networks)

Short Info: database software middleware designed to facilitate the deployment and programming of sensor networks. The software takes data (either directly from a sensor or from a CSV file), enters it into a database and provides a web based query interface. What sets GSN apart over most web based sensor access tools is that it is completely generalised and able to handle sensors of all types. Using GSN, you can ignore the complex sensor network details and focus only on the high level application logic. GSN consists of several parts: a data acquisition module, a database module, a web-based query module and an external web services module. GSN is being consistently updated by an open source community.

Website: GSN official website

Responsible Partner: EPFL

Contact person: Hoyoung Jeung

OKKAM

Short Info: the project aims at enabling the Web of Entities, namely a virtual space where any collection of data and information about any type of entities (e.g. people, locations, organizations, events, products, ...) published on the Web can be integrated into a single virtual, decentralized, open knowledge base (like the Web did for hypertexts). OKKAM contributes to this vision by supporting the convergence towards the use of a single and globally unique identifier for any entity which is named on the Web. The intuition of the project is that the concrete realization of the Web of Entities requires that we enable tools and practices for cutting to the root the proliferation of unnecessary new identifierss for naming the entities which already have a public identifier (the OKKAM's razor). Therefore, OKKAM makes available to content creators, editors and developers a global infrastructure and a collection of new tools and plugins which support them to easily find public identifiers for the entities named in their contents/services, use them for creating annotations, build new network-based services which make essential use of these identifiers in an open environment (like the Web or large Intranets).

Website: OKKAM official website

Responsible Partner: EPFL

Contact Person: Zoltan Miklos

LDSpider (Linked Data Spider)

Short Info: web crawling framework for the linked data web. Requirements and challenges for crawling the linked data web are different from regular web crawling, thus this projects offer a web crawler adapted to traverse and harvest sources and instances from the linked data web. We offer a single jar which can be easily integrated into own applications. LDSpider scales to small to medium-sized datasets in the order of hundreds of millions of triples.

Website: LDSpider official website

Responsible Partner: KIT

Contact person: Andreas Harth

R2O/ODEMapster

Short Info: The UPM framework to Upgrade Relational Legacy Data to the Semantic Web consists of

  • R2O, a fully declarative, XML-based language that allows the description of arbitrarily complex mapping expressions between ontology elements (concepts, attributes and relations) and relational elements (relations and attributes).
  • The ODEMapster processor, which generates Semantic Web instances from relational instances based on the mapping description expressed in an R2O document
  • ODEMapster plugin provides users a Graphical User Interface that allows to create, execute, or query mappings between ontologies and databases

Website: ODEMapter plugin website

Responsible Partner: UPM

Contact person: Freddy Priyatna

S2O Platform

Short Info: A framework for providing access to streaming data based on ontologies, consists of

  • SparqlStream : a sparql extension for rdf streams.
  • S2O : an extension to R2O for expressing mappings from streaming sources to an ontology.
  • SNEE : a query processing engine over relational data streams.

Website: SPARQL-Stream: Ontology-based Access to Data Streams

Responsible Partner: UPM

Contact person: Oscar Corcho

MonetDB

Short Info: a open-source (column store) database system for high-performance applications in data mining, OLAP, GIS, XML Query, text and multimedia retrieval designed, implemented and maintained by the Database Architectures group at CWI Amsterdam. Products:

  • MonetDB/SQL: the MonetDB relational database solution;
  • MonetDB/XQuery: the MonetDB XML database solution.
  • MonetDB Server: the MonetDB multi-model database server.

Website: monetDB website

Responsible Partner: CWI

Contact person: Ying Zhang

D2R Server

Short Info: D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language.

Website: http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/

Responsible Partner: FUB

Contact person: Christian Becker

Silk

Short Info: The Silk framework is a tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. It includes Silk Workbench, a web application which guides the user through the process of interlinking data sources.

Website: http://www4.wiwiss.fu-berlin.de/bizer/silk

Responsible Partner: FUB

Contact person: Robert Isele

Pubby

Short Info:Much Semantic Web data lives inside triple stores and can be accessed only by sending SPARQL queries to a SPARQL endpoint. It is hard to connect information in these stores with other external data sources. Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application. Pubby can be used to add Linked Data interfaces to SPARQL endpoints.

Website: http://www4.wiwiss.fu-berlin.de/pubby

Responsible Partner: FUB

Contact person: Richard Cyganiak

DBpedia Spotlight

Short Info: DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.

Website: http://spotlight.dbpedia.org

Responsible Partner: FUB

Contact person: Pablo Mendes

Geometry2RDF

Short Info: A tool that generates RDF triples from geometrical information, which can be available in GML or WKT.

Website: http://mccarthy.dia.fi.upm.es/geometry2rdf/

Responsible Partner: UPM

Contact person: Boris Villazon-Terrazas

CumulusRDF

Short Info: Cumulus is an RDF store on cloud-based architectures. Cumulus provides a REST-based API with CRUD operations to manage RDF data. The current version uses Apache Cassandra as storage backend.

Website: http://code.google.com/p/cumulusrdf/

Responsible Partner: KIT

Contact person: Andreas Harth

LDIF

Short Info: In order to ease using Web data in the application context, it is thus advisable to translate data to a single target vocabulary (vocabulary mapping) and to replace URI aliases with a single target URI on the client side (identity resolution), before starting to ask SPARQL queries against the data. Up-till-now, there have not been any integrated tools that help application developers with these tasks. With LDIF, we try to fill this gap and provide an open-source Linked Data Integration Framework that can be used by Linked Data applications to translate Web data and normalize URI aliases.

Website: http://www4.wiwiss.fu-berlin.de/bizer/ldif/

Responsible Partner: FUB

Contact person: Andreas Schultz

R2R

Short Info: The R2R Framework consists of a mapping language for expressing term correspondences, best-practices on how to publish mappings on the Web and a Java API for transforming data according to these mappings. The syntax of the R2R mapping language is very similar to the query language SPARQL, which eases the learning curve. The mapping language covers value transformation for use cases where RDF datasets use different units of measurement and can handle one-to-many and many-to-one correspondences between vocabulary elements. The R2R Java API transforms Web data to a given target vocabulary.

Website: http://www4.wiwiss.fu-berlin.de/bizer/r2r/

Responsible Partner: FUB

Contact person: Andreas Schultz

NOR2O

Short Info: Non-Ontological Resources (NORs) are knowledge resources whose semantics have not yet been formalized by an ontology.

Website: http://mccarthy.dia.fi.upm.es/nor2o/

Responsible Partner: UPM

Contact person: Boris Villazon-Terrazas

Projects

LOD data quality analysis framework

Short Description: The Freie Uni Berlin has a lot of experience with LOD; the KIT Karlsruhe developed a crawler, which is able to collect the links. They will store these links in the column store monetDB from CWI Amsterdam. Both FUB and EPFL is interested to study this crawled dataset, particularly in terms of data quality.

Participating Institutions: Freie University, KIT, CWI, EPFL

Start Date:

End Date:

Project Events:

SwissEx data exploitation

Short Description: UPM (Madrid) and IJS (Joseph Stephan Inst, Slovenia) will use GSN and access some of the sensors of SwissEx.

Participating Institutions: UPM, IJS, EPFL

Start Date:

End Date:

Project Description:Sensor network deployments are a primary source of massive amounts of data about the real-world that surrounds us, measuring a wide range of physical properties in real time. However, in large-scale deployments it becomes hard to effectively exploit the data captured by the sensors, since there is no precise information about what devices are available and what properties they measure. Even when metadata is available, users need to know low-level details such as database schemas or names of properties that are specific to a device or platform.

Therefore the task of coherently searching, correlating and combining sensor data becomes very challenging. We have realized an ontology-based approach that consists in exposing sensor observations in terms of ontologies enriched with semantic metadata providing information such as: which sensor recorded what, where, when, and in which conditions. For this, we allow defining virtual semantic streams, whose ontological terms are related to the underlying sensor data schemas through declarative mappings, and can be queried in terms of high level sensor network ontology.


Our application setting semantic-enriched query processing based on ontology information. For example, two users may name two sensors as of types “temperature" and “thermometer", yet the query processing in the framework can recognize that both sensors belong to the same type and include them in query results.

  • The software framework employs the ssn ontology (UPM), along with domain-specific ontologies, for effectively modeling the underlying heterogeneous sensor data sources, and establishes mappings between the current sensor data model and the ssn ontology observations using a declarative mapping language.
  • The framework enables scalable search over distributed sensor data. Specifically, the query processor first looks up ontology-enabled metadata to effectively find which distributed nodes maintain the sensor data satisfying a given query condition. It then dynamically composes URL API requests to the corresponding data sources at the distributed GSN nodes (EPFL)
  • Our framework has been developed in close collaboration with expert users from environmental science and engineering, and thus reflects central and immediate requirements on the use of federated sensor networks of the affected user community. The resulting system has been running as the backbone of the Swiss Experiment platform3, a large-scale real federated sensor network.

Use case

Gaining Knowledge from Streaming Data

Short Description: a tool for gaining more knowledge from streaming data by integrating the SPARQLStream processor of UPM (Madrid) and the MonetDB/DataCell engine of CWI (Centrum Wiskunde & Informatica).

Participating Institutions: UPM, CWI

Start Date: March 2012

End Date:

Project Description:

Recent development in mobile technologies and wireless communication has resulted in an avalanche of streaming data. In just about a decade, streaming data has become ubiquitous in our daily life. Among others, sensor data is a major class of streaming data with the longest history. So, in the work presented in this document, we focus on this type of streaming data. Nowadays, sensors have been adopted by a broad scope of applications, such as weather forecasting, traffic management, satellite imaging for earth observation, elderly care and seismic events detection. Millions of sensors bring us not only a vast amount of data, but also data sources of various content, formats, modality and quality. This gives plenty of opportunities for new kinds of applications that utilise many data sources simultaneously, thus achieving functions not possible by using any single sensor network. Such applications require the use of heterogeneous and rapidly changing data sources in an integrated manner.

The data stream management systems (DSMSs) have focused on efficient managing and processing of streaming data. These issues are mainly addressed in the context of individual sensor networks. The existing DSMSs do not provide tools to publish and share the streaming data; neither do they try to derive knowledge from the streaming data. As a result, applications dealing with streaming data are tied closely to one or a few sensor networks and are mostly only available within the same organisation. Semantic Web technologies, on the other hand, have focused on how to publish and interlink data on the World Wide Web, and how to perform complex reasoning tasks on the data. However, these technologies do not take into account rapidly changes of the data.

The lack of integration and communication between different sensor networks often isolates important data streams and intensifies the existing problem of too much streaming data but not enough knowledge. To tackle these problems, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for the publishing, sharing, analysing and understanding of the streaming data. Various proposals have emerged that address open issues such as how to apply reasoning on streaming data; how to publish raw streaming data in Semantic Web and connect them to the existing datasets on the Semantic Web; and how to apply the query language of semantic data on streaming data.

To advance the state-of-the-art in applying Semantic Web technology on streaming data to gain more knowledge from streaming data and to derive knowledge based on streaming data, in this small project, we are integrating tools provided by PlanetData partners UPM and CWI into an ontology-based framework for accessing, publishing, sharing and reasonging of dynamic data produced by mobile devices, and interlinking streaming data with static RDF data. SPARQLStream is a SPARQL 1.1 extension to enable ontology-based querying of streaming data. MonetDB/DataCell is a streaming data processing engine, based on the relational database system MonetDB, being developed by CWI. Thus, in this framework, users can express their use cases of streaming data, possibly combined with static RDF data, in the SPARQLStream query language. The queries are subsequently translated by the SPARQLStream query processor into the relational continuous query language supported by MonetDB/DataCell for processing.

As as example, let's have a look at how the envisioned framework can be used to gain and derive knowledge from weather data generated by the weather sensor stations.

  • Basic use case: if a user want to know the total amount of precipitation in an area, then, based on the rdfs:subClassOf property, the system should be able to recognise not only rainfall as precipitation, but also other kinds of precipitation, such as snowfall and hail.
  • Advanced use case: assume the weather data is linked to the DBpedia dataset through, e.g., the owl:sameAs property, which denotes that a place described by the DBpedia dataset is the same as the place in which a sensor is located. Then one can formulate semantic queries to find the areas that should be evacuated, if an extremely large amount of precipitation has been observed by the weather sensors.