Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Data Models 1.X

Revision as of 22:54, 23 January 2008 by Paul.socialphysics.org (Talk | contribs) (Overview)

Overview

The Higgins data model provides a common representation for identity, profile and relationship data to enable interoperability and data portability across heterogeneous sites and systems.

The model can provide data portability, interoperability and unification for three kinds of identity data: identity, profile and relationship. Identity information is related to identification, authentication, etc. Profile information can be preferences, interests, and associated objects like events and things, wishlists. Relationships are links to other Digital Subjects--they can be used to represent friends and other kinds of associations with other Digital Subjects. A key kind of relation introduced in the model is the a Higgins correlation--a link between different representations of the same real world object (e.g. you) in different contexts.

See Data Model Background for more information about the motivations for and design goals behind the model.

Motivation and Background

Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model, data from multiple locations and systems can be unified.

There is a great deal of interest among Web developers in solving interoperability and providing data portability. See, for example, http://DataPortability.org and many other related efforts. In this quest, the Higgins data model can provide powerful enabler for interoperability of identity-related information across the "silos."

There are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.

The kinds of data we wish to unify are very roughly classified as identity, profile and relationship data. Identity information is related to identification, authentication, etc. Profile information can be preferences, interests, and associated objects like events and things, wishlists. Relationships are links to other Digital Subjects--they can be used to represent friends and other kinds of associations with other Digital Subjects. A key kind of relation introduced in the model is the a Higgins correlation--a link between different representations of the same real world object (e.g. you) in different contexts.

Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any Context through the Higgins API. This is part of what motivates Data Model Goals [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.

Moving further along the interoperability spectrum, if we add the requirement that every attribute/relationship is globally uniquely identifiable (see [5]in Data Model Goals) then we can use the Higgins IdAS API for more than a shallow syntactic parse of the data in various Contexts. We can, for example, assemble (join) attribute information about about two Digital Subjects held in two separate Contexts, and perhaps implemented by separate providers, without collision and data loss. Along these lines, Higgins itself needs to implement certain kinds of cross-Context attribute data flows for correlated Digital Subjects.

Beyond inspection and navigation, Higgins aspires to support applications that can also edit Context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any Context from any Context Provider bound into Higgins. This implies a number of things. First, we require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different Context Providers express the same semantic in different ways. For more about this see [6] in Data Model Goals.

Second, the specific schema of a Context's use of the abstract Higgins data model must be exposed at the CPI and API levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in Data Model Goals.

Design Goals

Data Model Goals provides an an enumeration of the top level design goals that ultimately led to the decision to using an RDF/OWL-based metamodel.

Higgins Data Model Definition

Rather than invent a new metamodel from scratch, the model is based on the W3C's Resource Description Framework (RDF) and Web Ontology Language (OWL 1.0). We used RDF and OWL to express a very abstract base ontology called higgins.owl (aka HOWL) that in turn describe the domain of identity information. The "Lexicon" project within the Identity Gang defined a set of identity domain concepts/terms that have been directly formalized in HOWL. These domain concepts include:

  1. Context
  2. ContextId
  3. SubjectId
  4. Digital Subject
  5. Entity
  6. Identity Attribute
  7. Relation

Their semantics (with the exception of Entity which is not modeled) have been expressed in higgins.owl that is summarized in the Higgins Ontology page. The Higgins Ontology pages define the semantics of HOWL.

An overview presentation on the data model can be found here: Higgins Data Model Intro (PPT)

Extending HOWL

HOWL is a base ontology. To be useful in real-world applications developers must develop specialized ontologies based on HOWL that describe a specific concrete domain.

For example, if a developer wanted to describe a CRM database, she would create an OWL ontology that would describe the data objects in the CRM database. This CRM database is called a Context in Higgins. If, for example, the database contained records about customers and those customers had full-names and email addresses, then the developer would define "Customer" as a sub-class of Digital Subject and "full-name" and "email" as kinds of Identity Attributes.

Here are some HOWL-based Ontologies:

HOWL and IdAS

The Identity Attribute Service (IdAS) provides a Java API that exposes read/write-able data from a wide variety of external data sources in the common Higgins model. The IdAS API implements but does not define the semantics of the Higgins data model.

Context Provider plug-ins to IdAS are used to adapt external system, site, database or other data source to the IdAS API. These Context Providers are responsible for data transformation between the Higgins model and their own internal data model. Higgins does not constrain the Context Provider's choice of data representation; it could be XML-based, object-oriented, relational, or anything else.

Context Providers can be used to adapt data stores/sources such as:

  • Directories: LDAP stores like eDirectory, Active Directory, OpenLDAP, etc...
  • Relational databases used by enterprise apps to store identity/profile information.
  • Digital social networks (node-edge graphs): data behind Facebook, MySpace, LinkedIn, etc; or the graphs created by mining email traffic
  • Email/IM/collaboration client account data: email and IM client accounts, contact/buddy lists
  • Identity/profile data stored in website "silos": personal information stored sites like eBay, Amazon, Google Groups, Yahoo Groups

Open Issues

Scope

The data model addresses "The need for interoperability" described here: Higgins Goals. In addition, items #3 and #5 of the charter state or imply the need for a robust identity and social networking data model:

Scope item 3. Provide an API and data model for the virtual integration and federation of identity and security information from a wide variety of sources.
Scope item 5. Provide a social relationship data integration framework that enables these relationships to be persistent and reusable across application boundaries.

References

RDF/OWL Related Resources

Misc Resources

  • http://identityschemas.org
  • "D3.2: Models" FIDIS, October, 2005, (PDF 74 pages). Summary: "The objective of this document is to present in a synthetic way different models of representation of a person ("person schema") that can be used in different application domains.
  • eduPerson spex

Links

Back to the top