Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Data Models 1.X"

(Motivation)
Line 12: Line 12:
 
An overview presentation on the data model can be found here: [http://www.eclipse.org/higgins/images/Higgins_Data_Model.ppt Higgins Data Model Intro (PPT)]
 
An overview presentation on the data model can be found here: [http://www.eclipse.org/higgins/images/Higgins_Data_Model.ppt Higgins Data Model Intro (PPT)]
  
===Kinds of interoperability===
+
==Kinds of interoperability==
  
 
Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any [[Context]] through the Higgins API. This is part of what motivates [[Data Model Goals]] [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.
 
Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any [[Context]] through the Higgins API. This is part of what motivates [[Data Model Goals]] [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.
Line 22: Line 22:
 
Second, the ''specific'' schema of a [[Context|Context's]] use of the abstract Higgins data model must be exposed at the CPI and API levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in [[Data Model Goals]].
 
Second, the ''specific'' schema of a [[Context|Context's]] use of the abstract Higgins data model must be exposed at the CPI and API levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in [[Data Model Goals]].
  
===Design Goals===
+
==Design Goals==
 
* [[Data Model Goals]] provides an an enumeration of the top level design goals that ultimately led to the decision to using an RDF/OWL-based metamodel.
 
* [[Data Model Goals]] provides an an enumeration of the top level design goals that ultimately led to the decision to using an RDF/OWL-based metamodel.
  
===The data model and IdAS ===
+
== Higgins Data Model Definition ==
[[Context Provider]]s are plug-ins to the [[Identity Attribute Service]] (IdAS) Component used in many [[Deployments]]). They are responsible for data transformation between the Higgins model and their own internal data model. Higgins does not constrain the [[Context Provider|Context Provider's]] choice of data representation; it could be XML-based, object-oriented, relational, or anything else.
+
  
Here are some examples of some of the [[Context Provider|Context Providers]] envisioned:
+
Rather than invent a new metamodel from scratch, the model is based on the W3C's Resource Description Framework (RDF) and Web Ontology Language (OWL 1.0). We used RDF and OWL to express a very abstract base ontology called higgins.owl (aka HOWL) that in turn describe the domain of identity information. The "Lexicon" project within the Identity Gang defined a set of identity domain concepts/terms that have been directly formalized in HOWL. These domain concepts include:
* Directories: LDAP stores like eDirectory, Active Directory, OpenLDAP, etc...
+
# [[Context]]
* Relational databases used by enterprise apps to store identity/profile information.
+
* Digital social networks (node-edge graphs): data behind Facebook, MySpace, LinkedIn, etc; or the graphs created by mining email traffic
+
* Email/IM/collaboration client account data: email and IM client accounts, contact/buddy lists
+
* Identity/profile data stored in website "silos": personal information stored sites like eBay, Amazon, Google Groups, Yahoo Groups
+
 
+
==Concepts used in the data model==
+
# [[Claim]]
+
# [[Context]]
+
 
# [[ContextId]]  
 
# [[ContextId]]  
 
# [[SubjectId]]  
 
# [[SubjectId]]  
 
# [[Digital Subject]]
 
# [[Digital Subject]]
 
# [[Entity]]
 
# [[Entity]]
# [[I-Card]]
 
# [[Identity Selector]]
 
 
# [[Identity Attribute]]
 
# [[Identity Attribute]]
# [[Subject Relation]]
+
# [[Relation]]
  
==Higgins.owl==
+
Their semantics (with the exception of [[Entity]] which is not modeled) have been expressed in OWL that is summarized in the [[Higgins Ontology]] page.
* [[Higgins Ontology]] --summary of higgins.owl (aka HOWL) classes and properties
+
  
Example Context Schemas (based on higgins.owl):
+
== Extending HOWL ==
 +
HOWL is a base ontology. To be useful in real-world applications developers must develop specialized ontologies based on HOWL that describe a specific concrete domain.  
 +
 
 +
For example, if a developer wanted to describe a CRM database, she would create an OWL ontology that would describe the data objects in the CRM database. This CRM database is called a [[Context]] in Higgins. If, for example, the database contained records about customers and those customers had full-names and email addresses, then the developer would define "Customer" as a sub-class of [[Digital Subject]] and "full-name" and "email" as kinds of [[Identity Attributes]].
 +
 
 +
Here are some HOWL-based Ontologies:
 
* [[test-person Example Context Ontology]]  
 
* [[test-person Example Context Ontology]]  
 
* [[Person-with-address Example Context Ontology]]
 
* [[Person-with-address Example Context Ontology]]
 
* [[Person-with-friend Example Context Ontology]]
 
* [[Person-with-friend Example Context Ontology]]
 +
 +
== HOWL and IdAS ==
 +
 +
The [[Identity Attribute Service]] provides a Java API that exposes read/write-able data from a wide variety of external data sources in the common Higgins model.
 +
 +
[[Context Provider]] plug-ins are used to adapt external system, site, database or other data source to the IdAS API. [[Context Provider]]s are responsible for data transformation between the Higgins model and their own internal data model. Higgins does not constrain the [[Context Provider|Context Provider's]] choice of data representation; it could be XML-based, object-oriented, relational, or anything else.
 +
 +
[[Context Provider]]s can be used to adapt data stores/sources such as:
 +
* Directories: LDAP stores like eDirectory, Active Directory, OpenLDAP, etc...
 +
* Relational databases used by enterprise apps to store identity/profile information.
 +
* Digital social networks (node-edge graphs): data behind Facebook, MySpace, LinkedIn, etc; or the graphs created by mining email traffic
 +
* Email/IM/collaboration client account data: email and IM client accounts, contact/buddy lists
 +
* Identity/profile data stored in website "silos": personal information stored sites like eBay, Amazon, Google Groups, Yahoo Groups
  
 
==Open Issues==
 
==Open Issues==
Line 64: Line 70:
 
: '''Scope item 5.''' Provide a social relationship data integration framework that enables these relationships to be persistent and reusable across application boundaries.
 
: '''Scope item 5.''' Provide a social relationship data integration framework that enables these relationships to be persistent and reusable across application boundaries.
  
==Reference==
+
== References ==
 
===RDF/OWL Related Resources===
 
===RDF/OWL Related Resources===
* OWL 1.1:
+
* OWL
 
** W3C OWL working group: http://www.w3.org/2007/OWL/wiki/OWL_Working_Group  
 
** W3C OWL working group: http://www.w3.org/2007/OWL/wiki/OWL_Working_Group  
 
** OWL 1.1 at Google Code: http://code.google.com/p/owl1-1/
 
** OWL 1.1 at Google Code: http://code.google.com/p/owl1-1/

Revision as of 22:38, 23 January 2008

Overview

The Higgins data model provides a common representation for identity, profile and relationship data to enable interoperability and data portability across heterogeneous sites and systems.

Motivation

Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model, data from multiple locations and systems can be unified.

There are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.

The kinds of data we wish to unify are very roughly classified as identity, profile and relationship data. Identity information is related to identification, authentication, etc. Profile information can be preferences, interests, and associated objects like events and things, wishlists. Relationships are links to other Digital Subjects--they can be used to represent friends and other kinds of associations with other Digital Subjects. A key kind of relation introduced in the model is the a Higgins correlation--a link between different representations of the same real world object (e.g. you) in different contexts.

An overview presentation on the data model can be found here: Higgins Data Model Intro (PPT)

Kinds of interoperability

Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any Context through the Higgins API. This is part of what motivates Data Model Goals [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.

Moving further along the interoperability spectrum, if we add the requirement that every attribute/relationship is globally uniquely identifiable (see [5]in Data Model Goals) then we can use the Higgins IdAS API for more than a shallow syntactic parse of the data in various Contexts. We can, for example, assemble (join) attribute information about about two Digital Subjects held in two separate Contexts, and perhaps implemented by separate providers, without collision and data loss. Along these lines, Higgins itself needs to implement certain kinds of cross-Context attribute data flows for correlated Digital Subjects.

Beyond inspection and navigation, Higgins aspires to support applications that can also edit Context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any Context from any Context Provider bound into Higgins. This implies a number of things. First, we require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different Context Providers express the same semantic in different ways. For more about this see [6] in Data Model Goals.

Second, the specific schema of a Context's use of the abstract Higgins data model must be exposed at the CPI and API levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in Data Model Goals.

Design Goals

  • Data Model Goals provides an an enumeration of the top level design goals that ultimately led to the decision to using an RDF/OWL-based metamodel.

Higgins Data Model Definition

Rather than invent a new metamodel from scratch, the model is based on the W3C's Resource Description Framework (RDF) and Web Ontology Language (OWL 1.0). We used RDF and OWL to express a very abstract base ontology called higgins.owl (aka HOWL) that in turn describe the domain of identity information. The "Lexicon" project within the Identity Gang defined a set of identity domain concepts/terms that have been directly formalized in HOWL. These domain concepts include:

  1. Context
  2. ContextId
  3. SubjectId
  4. Digital Subject
  5. Entity
  6. Identity Attribute
  7. Relation

Their semantics (with the exception of Entity which is not modeled) have been expressed in OWL that is summarized in the Higgins Ontology page.

Extending HOWL

HOWL is a base ontology. To be useful in real-world applications developers must develop specialized ontologies based on HOWL that describe a specific concrete domain.

For example, if a developer wanted to describe a CRM database, she would create an OWL ontology that would describe the data objects in the CRM database. This CRM database is called a Context in Higgins. If, for example, the database contained records about customers and those customers had full-names and email addresses, then the developer would define "Customer" as a sub-class of Digital Subject and "full-name" and "email" as kinds of Identity Attributes.

Here are some HOWL-based Ontologies:

HOWL and IdAS

The Identity Attribute Service provides a Java API that exposes read/write-able data from a wide variety of external data sources in the common Higgins model.

Context Provider plug-ins are used to adapt external system, site, database or other data source to the IdAS API. Context Providers are responsible for data transformation between the Higgins model and their own internal data model. Higgins does not constrain the Context Provider's choice of data representation; it could be XML-based, object-oriented, relational, or anything else.

Context Providers can be used to adapt data stores/sources such as:

  • Directories: LDAP stores like eDirectory, Active Directory, OpenLDAP, etc...
  • Relational databases used by enterprise apps to store identity/profile information.
  • Digital social networks (node-edge graphs): data behind Facebook, MySpace, LinkedIn, etc; or the graphs created by mining email traffic
  • Email/IM/collaboration client account data: email and IM client accounts, contact/buddy lists
  • Identity/profile data stored in website "silos": personal information stored sites like eBay, Amazon, Google Groups, Yahoo Groups

Open Issues

Scope

The data model addresses "The need for interoperability" described here: Higgins Goals. In addition, items #3 and #5 of the charter state or imply the need for a robust identity and social networking data model:

Scope item 3. Provide an API and data model for the virtual integration and federation of identity and security information from a wide variety of sources.
Scope item 5. Provide a social relationship data integration framework that enables these relationships to be persistent and reusable across application boundaries.

References

RDF/OWL Related Resources

Misc Resources

  • http://identityschemas.org
  • "D3.2: Models" FIDIS, October, 2005, (PDF 74 pages). Summary: "The objective of this document is to present in a synthetic way different models of representation of a person ("person schema") that can be used in different application domains.
  • eduPerson spex

Links

Back to the top