Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Context Data Model 2.0
Contents
Version
- This page describes version 2.0 of the Context Data Model
- See Context Data Model 1.0 for the released Higgins 1.0 version
Introduction
Although the CDM can be used for almost any kind of data, the focus of CDM is to provide a foundation for integrating, unifying, and sharing identity-related data. In particular we are focused on information about a person, a group or an entire organization. This might include contact information, authentication data, preferences, email addresses, interests, employer-related information. An object representing a single person, might have relationships to other objects and other people in the same or different data contexts.
See Higgins Data Model Intro (PPT) for an overview. --has not been updated to CDM 2.0
Motivation
Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model data from multiple locations and systems can be unified at best or at least correlated.
There is a great deal of interest among Web developers in solving interoperability and providing data portability. See, for example, http://DataPortability.org and many other related efforts. In this quest, the Context Data Model can provide powerful enabler for interoperability of identity-related information across the "silos."
Why a Common Model?
There are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.
Kinds of Data
The data model's focus is on the unification of identity-related data. We need to be able to create rich, contextualized representations of people, groups and organizations. These objects have attributes that range from simple literals identification attributes, authentication data attributes, names, email addresses and telephone numbers, to complex attributes that are essentially links to other objects, people, groups, documents, calendar events, music preferences, and so on. These relationship attributes might be "friend", "manager", "likes", "owns", etc.
A key innovation in the model is the a Higgins correlation attribute. If object a has a correlation link to object b, this implies that both a and b are representations of the same person, organization, thing or concept that exists outside of the Higgins model. Since a and b may be in different contexts each using differing and incompatible semantics, the semantics of the correlation attribute is much weaker than saying that the descriptions of a and b are "the same" and thus their descriptions can be logically merged (as for example would be implied by owl:sameAs).
More about Interoperability
Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any Context through the Higgins IdAS API. This is part of what motivates Context Data Model Goals [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.
Beyond inspection and navigation, Higgins aspires to support applications that can also edit context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any Context from any Context Provider bound into Higgins. This implies two things:
- We require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different Context Providerss express the same semantic in different ways. For more about this see [6] in Data Model Goals.
- The specific schema of a Context's use of the abstract CDM must be exposed at the Context Provider (SPI) and IdAS (API) levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in Data Model Goals.
Context Data Model Goals provides an an enumeration of the original top level design goals that ultimately led to the decision to using an RDF-based metamodel.
Relationship to RDF
The Context Data Model (CDM) encompasses the core semantics of the W3C's Resource Description Framework (RDF); anything expressible in RDF, with the exception of blank nodes, is expressible in the CDM although the converse isn't true.
Most of the subtle but important differences between CDM and RDF are derived from differences in the choice of identifiers used to identify objects in each model. RDF is based on pure HTTP URIs, whereas CDM is based on a more generalized URI called a UDI. Objects identified by URIs in RDF are called Resources, whereas in CDM they are identified by UDIs and are called Entities.
CDM entities, attributes and values form interconnected graphs of objects. A graph is contained in a data set called a Context. This is analogous to an RDF dataset. Within this main context graph may live zero or more sub-graphs known as sub-contexts. Each of these sub-contexts is analogous to the Named Graph extensions to RDF. Like named graphs the contexts are themselves entities that may have arbitrary attributes and values.
The CDM differs from RDF on a syntactic (semantically lossless) level. In RDF an object may have N properties of type T each of which has a single value whereas in the CDM an object may only have 0..1 property of type T, and if the property exists it has 1..N values. Further, in the CDM these properties are called Attributes.
Key Concepts
Top level concepts
- Context - a container of statements about Entities.
- Entity - an object identified by an EntityId.
- Attribute - a property of an Entity or a Context. Attributes of an Entity are distinguished from one another by its AttributeId. Attributes have 1..N values. These values may be simple (literals) or complex (other Entities).
- Data Range - a definition of a kind of simple, literal Attribute value. Generally a syntax restriction on one of the XML Schema datatypes.
cdm.owl
CDM uses concepts that, while they can be approximated in OWL, are unique to CDM and are not compatible with existing RDF/OWL data sources. These are described in a file called Cdm.owl 1.1. This cdm.owl file is provided only as a description using RDF/OWL of the foundational concepts of CDM (e.g. "Entity"). However the cdm.owl file should not be imported or used in creating ontologies, it was created only as a description of the CDM metamodel itself.