Jump to: navigation, search

HGraph

{{#eclipseproject:technology.higgins|eclipse_custom_style.css}}
Higgins logo 76Wx100H.jpg

This page describes HGraph, a new component API being considered for Higgins 2.0.

Motivation

The initial motivation for HGraph was to address a number of limitations of IdAS related to describing and managing metadata. As will be described, rather than changing IdAS at all, this page describes a new component called HGraph that, when layered over IdAS, not only addresses these limitations but affords some additional capabilities.

Some Higgins-based applications need to able to conveniently describe and manage metadata about what might be called base entities. Examples include the need to cleanly separate provenance metadata entities from the base entities, or the need to be able to associate a set of attributes with a complex-valued (link-valued) attribute. While it is possible with IdAS, it is far from convenient.

The cleanest way to handle metadata is to first place the base entities in a new context and then to add the desired metadata (we'll use that term) attributes to the entity representing the new context.

Here is an example of this approach diagrammed below. We have two contexts (represented by c1 and c2) both of which make statements about entity e1. In the left-hand context, for which the RMV is the authority, makes the statement (e1 eye-color "blue") whereas in the right-hand context, for which the State Department is authoritative, makes the statement (e1 eye-color "green"). As you can see, these two authorities disagree about e1's eye color.

Provenance 2.0.100.png

The problem lies in the fact that IdAS might be said to be context-centric. What we mean by this is that whereas the authoritative context for a given entityId can be discovered dynamically, there is no convenient and fast way to assemble all of the other (non-authoritative) contexts that may also make statements about a given entity (of the same entityId). In practice this is sufficiently awkward that the general IdAS idiom is to simply not do so, and to only have a single context for information about an entity. In practice a given entityId typically only occurs in a single context--its authoritative context.

The IdAS API consumer has the burden of keeping track of all of the contexts that contain a given entityId and merging together the attributes of each. The reason for the merge is that most IdAS consumers simply want to know the value of some attribute of e1 and doesn't care which context made the statement. In the previous example most IdAS consumers simply want to know the the eye-color of e1. In other words it simply wants to get the values "blue" and "green".

To complicate matters further, in order for the IdAS consumer to have acceptable performance it would need to maintain its own cache of the list of all of the contexts that make statements about e1.

Note that due to the open world assumption neither the consuming code nor IdAS can every be guaranteed to know about all of the contexts that exist on the net that make statements about e1.

Design

HGraph is a new component that layers over IdAS and provides an entity-centric view of the data managed by IdAS below. The intent is to evolve the current code above IdAS to run on top of HGraph instead.

Hgraph 2.png

The HGraph component exposes compound entities (hgraph.IEntity instances) as opposed to the regular idas.IEntity objects. Given an entityId, e1, HGraph will return a compound entity. A compound entity has one or more sets of attribute/values where each set is a cached copy of the attributes of e1 in each of N contexts.

API

In order to make it as easy as possible for current IdAS consumer code to be evolved to consume the HGraph API instead of the IdAS API, the plan is to make the HGraph API as similar as possible to the IdAS API. The HGraph API will be a pure super-set of the IdAS API (including the idas.udi API).

We may be able to design the .hgraph.IEntity interface to be simply some additional methods to the idas.IEntity interface. If so, the semantics of the .hgraph.IEntity methods of the corresponding name as the original .idas.IEntity methods are slightly different. The difference is that the hgraph.IEntity will return the union of all attributes found on all instances of the entity in question, irrespective of context. Since at present no known IdAS consumers assume that a given entity exists in more than its sole authoritative context, this change in semantics won't break any of these consumers.

Graph Traversal

The hgraph.IEntity offers improved support for graph traversal vs. idas.IEntity.

In IdAS if you open a context and call IContext.getEntity(eid) you get back an idas.IEntity as you'd expect. However if you enumerate its attributes and find a complex-valued (aka entity-valued) attribute you can directly get its value (an IEntity) only if the entity happens to live within the same context.

If you try the same thing on the a complex-valued attribute of an hgraph.IEntity HGraph whether the value entity is in the same context or not, an hgraph.IEntity value will be returned. This returned hgraph.IEntity is lazily evaluated; it as actually a facade over the "real" IEntity. Attempts to get any of its attributes will cause the facade to dynamically resolve the UDI reference opening some previously un-opened context if necessary and thus read in the real entity data behind the facade. If authentication materials were needed to open a new context, then an exception would be thrown (or perhaps a callback method called) and the consumer would have to provide the needed materials.

Architecture

HGraph contains a write-through cache that implements the SAIL API. Existing open source SAIL storage implementations can be used.

Hgraph 2.0.101b.png

As you can see there are now two new alternative APIs to Higgins data. The first is the new HGraph API that has been described. The second is the SAIL quad store API.

We must implement this architecture such that both of these new APIs can be used by different consumers simultaneously. This is necessary to support an XDI endpoint (the Attribute Service) running on top of HGraph while at the same time supporting the ability for a SAIL-compatible SPARLQ engine to run on top of the SAIL API.

Benefits

HGraph was initially conceived in order to make it easier to manage multi-contextual entities (entities that exist in N>1 contexts simultaneously), but it has the following additional benefits:

Cache/index for IdAS

  • HGraph provides a write-through cache for data managed by any and all context providers plugged into IdAS. This provides:
  • Performance boost for read operations from context providers that access remote backing stores
  • Simplifies context provider development: context providers no longer have to implement internal data caches.

Graph traversal

  • HGraph is entity centric and lets consumer code walk from entity node to entity node in the graph not having to worry about context boundaries.
  • Contexts are dynamically (and lazily) loaded in HGraph as need (e.g. during graph traversal)--all entities are actually proxy/facades over "real" entity data

SAIL API

  • HGraph's cache is an implementation of the openrdf.org SAIL RDF quad store API. This provides the following advantages.
  • Provides a new extension point: any SAIL-compatible quad store implmentation can be used. In addition SAIL implementations can be stacked allowing SAIL-compatible inference engines to be incorporated--thus adding inferencing capabilities to Higgins
  • The SAIL API provides a new standards-based API to access IdAS data.
  • The SAIL API can be accessed by SPARQL engines--thus providing the Higgins PDS with support for this additional protocol

Future role of IdAS

If we proceed with HGraph, all of the code that currently consumes IdAS will probably want to migrate to sitting on top of HGraph instead (because it looses nothing and gains a more convenient way to do things like deal with metadata and traverse the graph).

If the above is true then IdAS will then become a lower level API that is only used by HGraph. Context providers that implement the IdAS SPI will essentially be providing adapters for HGraph to adapt/extend it to new data sources.

Interestingly the SAIL interface is itself extensible and stackable. Thus it would seem in the end that HGraph would have two ways that new data sources could be adapted/plugged-in: IdAS context providers and SAIL implementations.

Open Issues

  • How do we implement transactions: single context, multiple context same provider, multiple contexts across providers, external (non-PDS) resources and contexts?
  • How do we authenticate when accessing entity in another PDS?
  • How do we do lazy evaluation/resolution of attributes and relationships?