Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Context Data Model 2.0"

(Identity-related concepts)
m (recategorized)
(44 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{#eclipseproject:technology.higgins}}
+
{{#eclipseproject:technology.higgins|eclipse_custom_style.css}}
 
[[Image:Higgins_logo_76Wx100H.jpg|right]]
 
[[Image:Higgins_logo_76Wx100H.jpg|right]]
  
== Version 1.1 ==
+
== Version ==
* This page describes version 1.1 of the Higgins Context Data Model
+
 
 +
* This page describes version 2.0 of the Context Data Model
 
* See [[Context Data Model 1.0]] for the released Higgins 1.0 version
 
* See [[Context Data Model 1.0]] for the released Higgins 1.0 version
  
 
== Introduction ==
 
== Introduction ==
The [[Context Data Model 1.1 | Context Data Model]] (CDM) is a portion of the [[Higgins Data Model]].
 
  
 
Although the CDM can be used for almost any kind of data, the focus of CDM is to provide a foundation for integrating, unifying, and sharing identity-related data. In particular we are focused on information about a person, a group or an entire organization. This might include contact information, authentication data, preferences, email addresses, interests, employer-related information. An object representing a single person, might have relationships to other objects and other people in the same or different data contexts.
 
Although the CDM can be used for almost any kind of data, the focus of CDM is to provide a foundation for integrating, unifying, and sharing identity-related data. In particular we are focused on information about a person, a group or an entire organization. This might include contact information, authentication data, preferences, email addresses, interests, employer-related information. An object representing a single person, might have relationships to other objects and other people in the same or different data contexts.
  
See:
+
See [http://dev.eclipse.org/viewsvn/index.cgi/trunk/doc/org.eclipse.higgins.doc/Higgins-Data-Model-Intro.ppt?revision=20903&root=Technology_HIGGINS Higgins Data Model Intro (PPT)] for an overview. --has not been updated to CDM 2.0
* [http://dev.eclipse.org/viewsvn/index.cgi/org.eclipse.higgins/trunk/doc/org.eclipse.higgins.doc/Higgins-Data-Model-Intro.ppt?root=Technology_SVN&view=co Higgins Data Model Intro (PPT)] for an overview.
+
* [[Context Data Model Background]] for information about motivations and design goals.
+
  
== CDM Core Semantics ==
+
=== Motivation ===
The Context Data Model (CDM) encompasses the core semantics of the W3C's [http://www.w3.org/RDF/ Resource Description Framework] (RDF); anything expressible in RDF is expressible in the CDM although the converse isn't true.
+
  
Most of the subtle but important differences between CDM and RDF are derived from differences in the choice of identifiers used to identify objects in each model. RDF is based on pure HTTP URIs, whereas CDM is based on a more generalized URI called a [[UDI]]. Objects identified by URIs in RDF are called ''Resources'', whereas in CDM they are identified by [[UDI]]s and are called [[Entity | Entities]].
+
Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model data from multiple locations and systems can be unified at best or at least correlated.
  
The CDM includes a kind of object called a [[Context]] that has no analog in RDF. Individual [[Context]]s can be thought of as containers of portions of the overall graph of objects. [[Context]]s partition the data space into disjoint sets of objects. [[Context Provider]] plug-in implementations map data stored in a various kinds of data stores into objects within [[Context]] boundaries in CDM.
+
There is a great deal of interest among Web developers in solving interoperability and providing data portability. See, for example, http://DataPortability.org and many other related efforts. In this quest, the [[Context Data Model]] can provide powerful enabler for interoperability of identity-related information across the "silos."
  
The CDM also differs from RDF on a syntactic (semantically lossless) level. In RDF an object may have N properties of type T each of which has a single value whereas in the CDM an object may only have 0..1 property of type T, and if the property exists it has 1..N values. Further, in the CDM these properties are called [[Attribute]]s.
+
=== Why a Common Model? ===
  
== Key Concepts ==
+
There are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.
  
=== Top level concepts ===
+
=== Kinds of Data ===
  
# [[Context]] - a container of objects called [[Entity | Entities]]. Contexts are identified by a [[ContextId]]
+
The data model's focus is on the unification of identity-related data. We need to be able to create rich, contextualized representations of people, groups and organizations. These objects have attributes that range from simple literals identification attributes, authentication data attributes, names, email addresses and telephone numbers, to complex attributes that are essentially links to other objects, people, groups, documents, calendar events, music preferences, and so on. These relationship attributes might be "friend", "manager", "likes", "owns", etc.
# [[Entity]] - instances of objects (as well as the [[Entity Class]]es and [[Attribute Type]]s that define them) within a [[Context]]. All are identified within a [[Context]] by an [[EntityId]]
+
# [[Attribute]] - a property of an [[Entity]]. Attributes of an [[Entity]] are distinguished from one another by its [[AttributeId]]. [[Attributes]] have 1..N values. These values may be simple (literals) or complex (other [[Entity | Entities]]). Complex-valued [[Attribute]]s are called [[Entity Relation]]s (think "links")
+
# [[Data Range]] - a definition of a kind of simple, literal [[Attribute]] value. Generally a syntax restriction on one of the XML Schema datatypes.
+
  
=== More core concepts ===
+
A key innovation in the model is the a Higgins ''correlation'' attribute. If object a has a correlation link to object b, this implies that both a and b are representations of the same person, organization, thing or concept that exists outside of the Higgins model. Since a and b may be in different contexts each using differing and incompatible semantics, the semantics of the correlation attribute is much weaker than saying that the descriptions of a and b are "the same" and thus their descriptions can be logically merged (as for example would be implied by owl:sameAs).
Here are a few more foundational concepts:
+
* The CDM is self-describing, recursive. We use special kinds of Entities called [[Entity Classes]] to describe classes of entities and we use other kinds of Entities called [[Attribute Type]]s that describe Attributes.
+
* Speaking of recursive, there is [[Statement]] class that allows Attributes to be added to a (single) value of an Attribute of an Entity.
+
* Getting into the weeds a bit, there is also a utility class called [[TimeSpan]] (and related Attributes: validFrom and validTo)
+
  
=== Identity-related concepts ===
+
=== More about Interoperability ===
Building on the core described above, the CDM introduces the following slightly more specialized concepts:
+
* There is an Entity class called [[Agent]] along with three subclasses: [[Group]], [[Organization]] and [[Person]].
+
* Entities and Contexts can be correlated using the [[Entity Correlation]] and [[Context Correlation]] links respectively.
+
* There are a set of pre-defined Attributes:
+
** part, and its sub-attribute member
+
** partOf, and its sub-attribute memberOf
+
* The following Attributes are defined to describe Attributes: displayOrder, category, authority, lastModified, lastVerifiedFromSource, lastVerifyAttempt
+
* Contexts can be correlated using the [[contextCorrelation]] Attribute
+
  
===Access control===
+
Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any [[Context]] through the Higgins IdAS API. This is part of what motivates [[Context Data Model Goals]] [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.
* Starting in 1.1M4 we have define a set of Entity Classes and Attributes that describe access control policies.
+
* The new Entity classes are [[Policy]], an abstract superclass for many kinds of policy we might want to model in the future, and [[AccessControl]], a subclass of [[Policy]]
+
* A new abstract super-attribute called [[accessControl]] is defined with these sub-attributes:
+
** onAttribute
+
** operation, and its sub-attributes: add, delete, modify and read
+
** selfOperation, and its sub-attributes: selfAdd, selfDelete, selfModify, selfRead
+
** selfSubject
+
  
== higgins.owl (HOWL) ==
+
Beyond inspection and navigation, Higgins aspires to support applications that can also edit context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any [[Context]] from any [[Context Provider]] bound into Higgins. This implies two things:
 +
#We require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different [[Context Providers]]s express the same semantic in different ways. For more about this see [6] in [[Data Model Goals]].
 +
#The ''specific'' schema of a [[Context|Context's]] use of the abstract CDM must be exposed at the Context Provider (SPI) and IdAS (API) levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in [[Data Model Goals]].
 +
 
 +
[[Context Data Model Goals]] provides an an enumeration of the original top level design goals that ultimately led to the decision to using an RDF-based metamodel.
 +
 
 +
== Relationship to RDF ==
 +
 
 +
The Context Data Model (CDM) encompasses the core semantics of the W3C's [http://www.w3.org/RDF/ Resource Description Framework] (RDF); anything expressible in RDF, with the exception of blank nodes, is expressible in the CDM although the converse isn't true.
 +
 
 +
Most of the subtle but important differences between CDM and RDF are derived from differences in the choice of identifiers used to identify objects in each model. RDF is based on pure HTTP URIs, whereas CDM is based on a more generalized URI called a [[UDI]]. Objects identified by URIs in RDF are called ''Resources'', whereas in CDM they are identified by [[UDI]]s and are called [[Entity | Entities]].
 +
 
 +
CDM entities, attributes and values form interconnected graphs of objects. A graph is contained in a data set called a [[Context]]. This is analogous to an RDF dataset. Within this main context graph may live zero or more sub-graphs known as sub-contexts. Each of these sub-contexts is analogous to the [http://www.w3.org/2004/03/trix/ Named Graph extensions to RDF]. Like named graphs the contexts are themselves entities that may have arbitrary attributes and values.
 +
 
 +
The CDM differs from RDF on a syntactic (semantically lossless) level. In RDF an object may have N properties of type T each of which has a single value whereas in the CDM an object may only have 0..1 property of type T, and if the property exists it has 1..N values. Further, in the CDM these properties are called [[Attribute]]s.
 +
 
 +
== Key Concepts ==
 +
 
 +
=== Top level concepts ===
 +
 
 +
# [[Context]] - a container of statements about [[Entity | Entities]].
 +
# [[Entity]] - an object identified by an [[EntityId]].
 +
# [[Attribute]] - a property of an [[Entity]] or a [[Context]]. Attributes of an [[Entity]] are distinguished from one another by its AttributeId. [[Attributes]] have 1..N values. These values may be simple (literals) or complex (other [[Entity | Entities]]).
 +
# [[Data Range]] - a definition of a kind of simple, literal [[Attribute]] value. Generally a syntax restriction on one of the XML Schema datatypes.
  
The CDM model is a set of concepts. From 1.1M4 forward, we divide this set of concepts into two subsets. The first subset contains concepts that can are expressed in OWL1.1 and can be directly imported and used as a base ontology by RDF/OWL data sources and systems. These are contained in the file [[higgins.owl 1.1]] -- a file that has been nicknamed ''HOWL.'' This file describes many of the core concepts used in the CDM. [[higgins.owl 1.1]] is an abstract (sometimes called an "upper") ontology for identity information. It is abstract in that it doesn't describe any concrete attributes such as "email address" or "first name". It also doesn't define very specialized classes of objects such as "calendar event" or "student", "movie", "book", etc. These are left to [[Context Provider]]s to define for themselves.
+
== cdm.owl ==
  
The second subset contains CDM concepts that, while they can be approximated in OWL, are unique to CDM and are not compatible with existing RDF/OWL data sources. These are described in a file called [[cdm.owl 1.1]].
+
CDM uses concepts that, while they can be ''approximated'' in OWL, are unique to CDM and are not compatible with existing RDF/OWL data sources. These are described in a file called [[Cdm.owl 1.1]]. This cdm.owl file is provided only as a description using RDF/OWL of the foundational concepts of CDM (e.g. "Entity"). However the cdm.owl file should not be imported or used in creating ontologies, it was created only as a description of the CDM metamodel itself.
  
=== Building on higgins.owl 1.1 ===
 
Developers must create specialized ontologies based on HOWL that describe specific concrete domains.
 
  
For example, if a developer wanted to describe a CRM database, she would create an OWL ontology that would describe the data objects in the CRM database. This CRM database is called a [[Context]] in Higgins. If, for example, the database contained records about customers and those customers had full-names and email addresses, then the developer would define "Customer" as a sub-class of [[Entity]] and "full-name" and "email" as kinds of [[Attribute]]s.
 
  
==Misc ==
+
== See Also ==
* [[Context Data Model 1.1 Open Issues]]
+
* [[Higgins Data Model 2.0]]
* [[Context Data Model Related Resources]]
+
  
[[Category:Higgins Data Model]]
+
[[Category:Higgins 2]]
[[Category:Context Data Model 1.1]]
+

Revision as of 17:19, 25 April 2011

{{#eclipseproject:technology.higgins|eclipse_custom_style.css}}

Higgins logo 76Wx100H.jpg

Version

  • This page describes version 2.0 of the Context Data Model
  • See Context Data Model 1.0 for the released Higgins 1.0 version

Introduction

Although the CDM can be used for almost any kind of data, the focus of CDM is to provide a foundation for integrating, unifying, and sharing identity-related data. In particular we are focused on information about a person, a group or an entire organization. This might include contact information, authentication data, preferences, email addresses, interests, employer-related information. An object representing a single person, might have relationships to other objects and other people in the same or different data contexts.

See Higgins Data Model Intro (PPT) for an overview. --has not been updated to CDM 2.0

Motivation

Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model data from multiple locations and systems can be unified at best or at least correlated.

There is a great deal of interest among Web developers in solving interoperability and providing data portability. See, for example, http://DataPortability.org and many other related efforts. In this quest, the Context Data Model can provide powerful enabler for interoperability of identity-related information across the "silos."

Why a Common Model?

There are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.

Kinds of Data

The data model's focus is on the unification of identity-related data. We need to be able to create rich, contextualized representations of people, groups and organizations. These objects have attributes that range from simple literals identification attributes, authentication data attributes, names, email addresses and telephone numbers, to complex attributes that are essentially links to other objects, people, groups, documents, calendar events, music preferences, and so on. These relationship attributes might be "friend", "manager", "likes", "owns", etc.

A key innovation in the model is the a Higgins correlation attribute. If object a has a correlation link to object b, this implies that both a and b are representations of the same person, organization, thing or concept that exists outside of the Higgins model. Since a and b may be in different contexts each using differing and incompatible semantics, the semantics of the correlation attribute is much weaker than saying that the descriptions of a and b are "the same" and thus their descriptions can be logically merged (as for example would be implied by owl:sameAs).

More about Interoperability

Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any Context through the Higgins IdAS API. This is part of what motivates Context Data Model Goals [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.

Beyond inspection and navigation, Higgins aspires to support applications that can also edit context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any Context from any Context Provider bound into Higgins. This implies two things:

  1. We require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different Context Providerss express the same semantic in different ways. For more about this see [6] in Data Model Goals.
  2. The specific schema of a Context's use of the abstract CDM must be exposed at the Context Provider (SPI) and IdAS (API) levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in Data Model Goals.

Context Data Model Goals provides an an enumeration of the original top level design goals that ultimately led to the decision to using an RDF-based metamodel.

Relationship to RDF

The Context Data Model (CDM) encompasses the core semantics of the W3C's Resource Description Framework (RDF); anything expressible in RDF, with the exception of blank nodes, is expressible in the CDM although the converse isn't true.

Most of the subtle but important differences between CDM and RDF are derived from differences in the choice of identifiers used to identify objects in each model. RDF is based on pure HTTP URIs, whereas CDM is based on a more generalized URI called a UDI. Objects identified by URIs in RDF are called Resources, whereas in CDM they are identified by UDIs and are called Entities.

CDM entities, attributes and values form interconnected graphs of objects. A graph is contained in a data set called a Context. This is analogous to an RDF dataset. Within this main context graph may live zero or more sub-graphs known as sub-contexts. Each of these sub-contexts is analogous to the Named Graph extensions to RDF. Like named graphs the contexts are themselves entities that may have arbitrary attributes and values.

The CDM differs from RDF on a syntactic (semantically lossless) level. In RDF an object may have N properties of type T each of which has a single value whereas in the CDM an object may only have 0..1 property of type T, and if the property exists it has 1..N values. Further, in the CDM these properties are called Attributes.

Key Concepts

Top level concepts

  1. Context - a container of statements about Entities.
  2. Entity - an object identified by an EntityId.
  3. Attribute - a property of an Entity or a Context. Attributes of an Entity are distinguished from one another by its AttributeId. Attributes have 1..N values. These values may be simple (literals) or complex (other Entities).
  4. Data Range - a definition of a kind of simple, literal Attribute value. Generally a syntax restriction on one of the XML Schema datatypes.

cdm.owl

CDM uses concepts that, while they can be approximated in OWL, are unique to CDM and are not compatible with existing RDF/OWL data sources. These are described in a file called Cdm.owl 1.1. This cdm.owl file is provided only as a description using RDF/OWL of the foundational concepts of CDM (e.g. "Entity"). However the cdm.owl file should not be imported or used in creating ontologies, it was created only as a description of the CDM metamodel itself.


See Also

Back to the top