Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Data Models 1.X"

(Concepts used in the data model)
m (Overview: typo)
Line 8: Line 8:
 
At the highest level the goal of the data model is to provide a common representation for identity, profile and relationship data in order to provide interoperability.
 
At the highest level the goal of the data model is to provide a common representation for identity, profile and relationship data in order to provide interoperability.
  
Information fragmentation is a pervasive problem. Even seamingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model, data from multiple locations and systems can be unified.
+
Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model, data from multiple locations and systems can be unified.
  
 
Of course there are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.
 
Of course there are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.

Revision as of 11:12, 13 September 2007

Motivation

This work area falls under "The need for interoperability" described here: Higgins Goals. In addtion, items #3 and #5 of the charter state or imply the need for a robust identity and social networking data model:

Scope item 3. Provide an API and data model for the virtual integration and federation of identity and security information from a wide variety of sources.
Scope item 5. Provide a social relationship data integration framework that enables these relationships to be persistent and reusable across application boundaries.

Overview

At the highest level the goal of the data model is to provide a common representation for identity, profile and relationship data in order to provide interoperability.

Information fragmentation is a pervasive problem. Even seemingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model, data from multiple locations and systems can be unified.

Of course there are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics.

The kinds of data we wish to unify are very roughly classified as identity, profile and relationship data. Identity information is related to identification, authentication, etc. Profile information can be preferences, interests, and associated objects like events and things, wishlists. Relationships can be any kind of associations between objects (typically between Digital Subjects) as well as affiliations.

Kinds of interoperability

Saying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any Context through the Higgins API. This is part of what motivates Data Model Goals [2], [3] and [4]. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there.

Moving further along the interoperability spectrum, if we add the requirement that every attribute/relationship is globally uniquely identifiable (see [5]in Data Model Goals) then we can use the Higgins IdAS API for more than a shallow syntactic parse of the data in various Contexts. We can, for example, assemble (join) attribute information about about two Digital Subjects held in two separate Contexts, and perhaps implemented by separate providers, without collision and data loss. Along these lines, Higgins itself needs to implement certain kinds of cross-Context attribute data flows for correlated Digital Subjects.

Beyond inspection and navigation, Higgins aspires to support applications that can also edit Context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any Context from any Context Provider bound into Higgins. This implies a number of things. First, we require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different Context Providers express the same semantic in different ways. For more about this see [6] in Data Model Goals.

Second, the specific schema of a Context's use of the abstract Higgins data model must be exposed at the CPI and API levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] in Data Model Goals.

Design Goals

  • Data Model Goals provides an an enumeration of the top level design goals that ultimately led to the decision to using an RDF/OWL-based metamodel.

The data model and IdAS

Context Providers are plug-ins to the Identity Attribute Service (IdAS) (see Architecture). They are responsible for data transformation between the Higgins model and their own internal data model. Higgins does not constrain the Context Provider's choice of data representation; it could be XML-based, object-oriented, relational, or anything else.

Here are some examples of some of the Context Providers envisioned:

  • Directories: LDAP stores like eDirectory, Active Directory, OpenLDAP, etc...
  • Relational databases used by enterprise apps to store identity/profile information.
  • Digital social networks (node-edge graphs): data behind Facebook, MySpace, LinkedIn, etc; or the graphs created by mining email traffic
  • Email/IM/collaboration client account data: email and IM client accounts, contact/buddy lists
  • Identity/profile data stored in website "silos": personal information stored sites like eBay, Amazon, Google Groups, Yahoo Groups

Concepts used in the data model

Higgins.owl

Example Context Schemas (based on higgins.owl):

Open Issues

Reference

RDF/OWL Related Resources

Misc Resources

  • http://identityschemas.org
  • "D3.2: Models" FIDIS, October, 2005, (PDF 74 pages). Summary: "The objective of this document is to present in a synthetic way different models of representation of a person ("person schema") that can be used in different application domains.
  • eduPerson spex

See Also

Back to the top