Skip to main content
Jump to: navigation, search

Difference between revisions of "Persona Data Model 2.0"

(See Also)
(Naming)
Line 114: Line 114:
 
Where
 
Where
 
* '''Person_1''' is the relative entity id within the context. In reality is more likely some number like  a34f38c390b8
 
* '''Person_1''' is the relative entity id within the context. In reality is more likely some number like  a34f38c390b8
 +
 +
===Issues===
 +
 +
In the Higgins PDS, all data is stored as Statements. Each Statement consists of a Subject, Predicate, and Object. Each Statement is stored in a Content (AKA Graph). Contexts, Subjects, Predicates, and many Objects (some Objects are literal values) are specified by URIs. A URI which represents a Subject in one Statement may represent a Context, Predicate, or Object in another Statement. Statements pertaining to the same Subject or Object may be contained in different Contexts. A Store contains a set of Contexts. The Higgins PDS attempts to handle scenarios involving multiple Stores.
 +
 +
Many applications will want to access all of the Statements related to a specific:
 +
* Context
 +
* Subject
 +
* Object
 +
* Subject and Predicate
 +
* Object and Predicate
 +
* Subject in Context
 +
* Object in Context
 +
* etc.
 +
 +
When all of the Contexts are contained in the same Store, these queries are relatively trivial. However, much complexity is involved when they are contained in multiple, potentially discoverable, Stores. When Contexts are created, a decision must be made regarding which Store will comtain them and how they will be named. When Statements are created, a decision must be made regarding which Context will contain them. When new Subjects and Objects are created, they must be named in a way that allows them to be unique and allows the discovery of which Context (and Stores) contain Statements pertaining to them. In the Higgins PDS, Users may establish relationships with different Store providers, and may decide to migrate from one provider to another. Therefore, the identifier used for Contexts, Subjects, and Objects should not lock them into a specific Store.
 +
 +
Linked Data URIs and Universal Data Identifiers (UDI)s are technologies designed to help solve many of these problems. However, these alternatives both seem limited in a way that assumes that all Statements related to that Subject or Object are contained in the same Context (in the same Store).
 +
 +
There seems be be a need for a service that allows the registration, modification, and resolution of these identifiers in a way that allows an appropriate authority to maintain control over how they resolve. Many problems must be solved to make this work, including:
 +
* No single authority is appropriate for all Statements related to a specific Subject or Object, so who controls how the URIs resolve?
 +
* Even if Statements are protected via access control, knowing that there are Statements related to a specific Subject or Object in a specific Context or Store is often enough to disclose sensitive information.
 +
* Since not all Stores are expected to be equally performant or contain relevent data, a search for all Statements related to a specific Subject or Object should not require replication of all Contexts in all Stores, but how are we to determine which are important?
 +
 +
Due to the above issues, for the near term, the Higgins Project will progress with the following simplifying assumption:
 +
* The will be one Store containing all Contexts accessed thru any instance of the Higgins Personal Data Service and associated clients.
 +
* Subjects and Object will be named relative to the authoritative Context where they are defined.
 +
* Therefore, a separate Name Resolution Service will not be needed.
 +
 +
===Questions===
 +
 +
* Assuming we can find all of the Statements related to a Subject, how can we determine which Context is authoritative for that Subject?
 +
** It seems enough in most cases that the Entity URI includes the authoritative Context URI.
 +
** Should there be a statement: <contextname> <isauthoritativefor> <entityname> for every Entity in order to cover that cases where Entity URI does not include the authoritative Context URI?
 +
* Other than the root Person, does the name of an Entity matter as long as it is unique within the scope of the authoritative Context?
 +
** A simple approach could be <contextname>//<entityname> where entityname is <classname>_<N> where N is a counter scoped to the Context.
 +
*** A statement: <contextname> <entitycounter> <N> would be placed inside each Context.
 +
*** N would be set to 0 initially and incremented each time a new Entity was created.
 +
 +
===Naming Conventions===
 +
 +
====Contexts====
 +
 +
Contexts are named according to their scope: Universal, User, Application, Site, User @ Site, User @ Application, User @ User, Application @ Site, etc. Most Context names will be in the form of a URI defined under the PDS. Where possible, the URI will resolve to Linked Data. URIs will be of the form:
 +
 +
  <code>http://<pdsservername>/data?[user=<username>][&][peer=<username>][&][app=<appname>][&][site=<sitename>][&][context=<contextname>]</code>
 +
 +
=====Universal Contexts=====
 +
 +
There are a few Contexts that contain definitions of classes and other statements that are required in order for the system to function (e.g. those that define the Persona Data Model). These Contexts are usually defined in OWL/RDF files that are accessible at their Base URI. The statements in these files will be loaded into Contexts named with the Base URI:
 +
 +
======Example======
 +
 +
  <code>http://www.eclipse.org/higgins/ontologies/2010/6/persona</code>
 +
 +
======Note======
 +
 +
Universal Contexts are also accessible as Linked Data under an alternate name relative to the PDS.
 +
 +
======Example======
 +
 +
  <code>http://pds.azigo.com/data?context=(http://www.eclipse.org/higgins/ontologies/2010/6/persona)</code>
 +
 +
=====User Contexts=====
 +
 +
For every User with an account, there is exactly one Root Context.
 +
 +
======Example======
 +
 
 +
  <code>http://pds.azigo.com/data?user=ptrevithick</code>
 +
 +
=====User at Site Contexts=====
 +
 +
======Example======
 +
 +
  <code>http://pds.azigo.com/data?user=ptrevithick&site=bestbuy.com</code>
 +
 +
=====Application at Site Contexts=====
 +
 +
======Example======
 +
 +
  <code>http://pds.azigo.com/data?app=formfiller&site=bestbuy.com</code>
 +
 +
=====User at Peer Contexts=====
 +
 +
======Example======
 +
 +
  <code>http://pds.azigo.com/data?user=ptrevithick&peer=mmcintosh</code>
 +
 +
====Non-Context Entities====
 +
 +
URIs will be relative to the authoritative Context URI.
 +
 +
=====Example=====
 +
 +
  <code><contextname>//<entityname></code>
 +
 +
=====Root Person=====
 +
 +
The Root Person in each User's Root Context has a relative name of Person_0.
 +
 +
======Example======
 +
 +
  <code>http://pds.azigo.com/data?user=ptrevithick//Person_0</code>
 +
 +
=====Other Entities=====
 +
 +
TBD
  
 
== Operations ==
 
== Operations ==

Revision as of 10:58, 6 January 2011

{{#eclipseproject:technology.higgins|eclipse_custom_style.css}}
Higgins logo 76Wx100H.jpg

The Persona Data Model 2.0 is a vocabulary for describing people. It is based on the Higgins Data Model 2.0 which is in turn based on Context Data Model 2.0 (aka CDM 2.0). It used by Attribute Data Service 2.0.

Person graph

A natural, human person is represented as a graph of p:Person entities (nodes, or vertices) interconnected by links (edges). Each node represents a different facet of the user (person). Each node is an entity (i.e. a set of attributes & values). These attributes may be simple literals (e.g. the user's first name) or they may be other entities. These latter complex attributes are rendered a as links (edges) to other nodes, but these edges and nodes are not considered part of the graph.

The graph is a logical abstraction. The data behind these nodes may be physically located anywhere on the Internet.

Typically each node in the person graph is located in its own container call a Context. The root node lies in a special context (for each user) called the root context.

All of the main person entities can be reached by traversing links of the following kinds, (although other links may also exist (e.g. p:source, foaf:knows, etc.)):

  • h:correlation
  • h:relation
  • h:indeterminate

In order to simplify the diagram below we follow a convention whereby the links are drawn between contexts whereas in reality the links are between the main p:Person objects within each of these contexts. Further, these main person entities may well themselves have complex attributes (i.e. links to other entities). These have also been omitted.

Unified contexts 2.0.120.png

Linked Contexts

The concept of a linked context involves a consuming person in one context be linked to a source person in another context. This is done by linking the person nodes using an intermediate SourceLink node. For example in the diagram below the P3 person node in context C3 is linked to P1 in C1 and P2 in C2 via two SourceLink instances. C3 is considered to be linked to C1 and C2.

Linked contexts 2.0.103.png

These links allow a single consuming person node to aggregate one or more other person nodes from other contexts. This promotes reuse of persons and contexts, and minimizes copying and duplication. For example a "recipient" person in one context might hold a name, address and phone number that correspond to a physical address, say the person's home. If the user uses this home address in 100 contexts (e.g. representing the user's relationship with 100 eCommerce sites) each of these 100 context's main persons can have a link to this shared home person & context, rather than having 100 copies of this person in each of the 100 contexts.

Every source link also has an inverse link p:consumer link pointing in the opposite direction. For clarity these "back" links are not shown above. Any person with more than one "incoming" p:source link (or, said another way more than one outgoing "p:consumer" link) is essentially a "shared" person. Updating a shared person has the effect of altering the attribute values that will be returned by the contexts that use a shared person as a source.

These p:sourceLink..p:source links and p:consumer links are used only to link person entities, not entities in general.

The purpose of the intermediate SourceLink entity is to qualify the source linkage. It holds a set of attribute names (selected from the Flat Persona vocabulary) that indicate which attributes the link sources. In the example above P1 is a source for fp:phone and fp:street-address, while P2 is a source for fp:givenName and fp:familyName

Access Control

The rules governing access to attributes in context C1 may be defined within C! or within an external control context C2 (where C2 is an instance of Template Context) or both. The access control policies are defined using the Higgins data model's access control vocabulary. Note: This C2 template context may also contain other rules and definitions unrelated to access control.

Representing Social Graphs

h:relation

HDM defines a h:relation complex attribute that is used in PDM to link one Person node to another where each Person node represents a different person. No symmetry is implied in this thus the statement (A h:relation B) is akin to saying person A "knows of" person B.

Shown below are two social graph examples. One uses foaf:knows links and and (unrelated to this) shows each node in its own context. The other uses h:relation links and (unrelated) shows all person nodes in a single context. In the Work context we see that the user knows three colleagues but doesn't know how they know one another. In the Home & Family context we see that the user knows two people and that everyone knows one another. The foaf:knows links are shown in both directions although logically this is redundant since foaf:knows is what is a called a symmetric relation.

Nodes that represent the user are shown in purple. Nodes representing a person other than the user are shown in red.

Social graph 2.0.102.png

foaf:knows

To indicate that a person A "knows" person B where some level of reciprocated interaction between the parties is implied, we use foaf:knows.

Since foaf:knows is a broader concept than h:relation, foaf:knows is not a sub-attribute of h:relation. Thus if we had the statement "A h:relation B" then we might later add a second statement "A foaf:knows B" to add the stronger, broader (and symmetric) concept of "knowing."

h:indeterminate

HDM also defines h:indeterminate link attribute on node A to indicates that its value(s) may or may not represent the same thing as is represented by A.

Implementation Note

Consumers of the HDM may traverse h:relation, h:correlation and h:indeterminate attribute links and (despite ignoring all other links) traverse the entire graph of Person nodes.

Vocabularies

Contexts may describe their contents in any vocabulary they wish so long as it builds on the Persona vocabulary. In the person graph example above all of the contexts except one describe their contents using the Persona Data Model (vocabulary) (shown as purple "PDM"s above). The exception is the managed i-card from Equifax which uses attribute (aka claim) URIs defined by the OASIS IMI TC and by the ICF's (Information Card Foundation) schema working group.

Naming

EntityIds and ContextIds

By convention the Persona Data Model 2.0 uses a restricted set of the full capabilities of CDM 2.0. The restriction is in the area of EntityIds and ContextIds. PDM 2.0 adds the following restriction:

  • All absolute entityId and all contextIds MUST be XRI 2.0 URIs or Linked Data URIs

EntityIds

  • An entityId MAY be represented in relative form--relative to the base URI of the containing context. e.g. "foo" is an example of a relative entityid. If http://boo.com# is the ContextId containing "foo" then http://boo.com#foo would be the absolute form
  • Whether or not an entityId is relative or absolute MUST be able to be determined by inspection of its syntax

Resolution:

  • EntityIds MAY be resolvable
  • A resolveable entityId resolves to exactly one entity. This entity is called the authoritative entity (resource description).
    • Note: although the following capability is not yet used by Higgins code, it is possible that for any given context, C, there may exist both entityId references that resolve to entities within C as well as entityId references that resolve to entities within contexts other than C
  • EntityIds that resolve to a context outside of their containing context MUST be in absolute form

Context objects:

  1. The entityId of the special object within a context that represents the context itself is, by convention, named "_ContextSingleton"

Example

Here is an example of an absolute EntityId:

   http://xri.net/@mydex*ptrevithick/($context)*(amazon.com)//Person_1

The above is comprised of this ContextId:

   http://xri.net/@mydex*ptrevithick/($context)*(amazon.com)

Where:

  • http//xri.net is prepended to @mydex to convert XRI into URI form
  • @mydex is Mydex's entry in the global "@" registry run by Neustar
  • @mydex*ptrevithick is paul's i-name (it is a mydex "community" i-name that cost me nothing as Mydex runs their own registry). We could have substituted "=paul.trevithick" (which Paul would have had to pay for) instead of "@mydex*ptrevithick" as long as Paul had carefully set up the ($context)*(amazon.com) SEP identically at both the =paul.trevithick service and the @mydex*ptrevithick service.
  • ($context)*(amazon.com) is the portion that identifies the amazon.com profile context as opposed to some other context on Paul's PDS

The above contextId is concatenated with "//" and this relative EntityId:

   Person_1

Where

  • Person_1 is the relative entity id within the context. In reality is more likely some number like a34f38c390b8

Issues

In the Higgins PDS, all data is stored as Statements. Each Statement consists of a Subject, Predicate, and Object. Each Statement is stored in a Content (AKA Graph). Contexts, Subjects, Predicates, and many Objects (some Objects are literal values) are specified by URIs. A URI which represents a Subject in one Statement may represent a Context, Predicate, or Object in another Statement. Statements pertaining to the same Subject or Object may be contained in different Contexts. A Store contains a set of Contexts. The Higgins PDS attempts to handle scenarios involving multiple Stores.

Many applications will want to access all of the Statements related to a specific:

  • Context
  • Subject
  • Object
  • Subject and Predicate
  • Object and Predicate
  • Subject in Context
  • Object in Context
  • etc.

When all of the Contexts are contained in the same Store, these queries are relatively trivial. However, much complexity is involved when they are contained in multiple, potentially discoverable, Stores. When Contexts are created, a decision must be made regarding which Store will comtain them and how they will be named. When Statements are created, a decision must be made regarding which Context will contain them. When new Subjects and Objects are created, they must be named in a way that allows them to be unique and allows the discovery of which Context (and Stores) contain Statements pertaining to them. In the Higgins PDS, Users may establish relationships with different Store providers, and may decide to migrate from one provider to another. Therefore, the identifier used for Contexts, Subjects, and Objects should not lock them into a specific Store.

Linked Data URIs and Universal Data Identifiers (UDI)s are technologies designed to help solve many of these problems. However, these alternatives both seem limited in a way that assumes that all Statements related to that Subject or Object are contained in the same Context (in the same Store).

There seems be be a need for a service that allows the registration, modification, and resolution of these identifiers in a way that allows an appropriate authority to maintain control over how they resolve. Many problems must be solved to make this work, including:

  • No single authority is appropriate for all Statements related to a specific Subject or Object, so who controls how the URIs resolve?
  • Even if Statements are protected via access control, knowing that there are Statements related to a specific Subject or Object in a specific Context or Store is often enough to disclose sensitive information.
  • Since not all Stores are expected to be equally performant or contain relevent data, a search for all Statements related to a specific Subject or Object should not require replication of all Contexts in all Stores, but how are we to determine which are important?

Due to the above issues, for the near term, the Higgins Project will progress with the following simplifying assumption:

  • The will be one Store containing all Contexts accessed thru any instance of the Higgins Personal Data Service and associated clients.
  • Subjects and Object will be named relative to the authoritative Context where they are defined.
  • Therefore, a separate Name Resolution Service will not be needed.

Questions

  • Assuming we can find all of the Statements related to a Subject, how can we determine which Context is authoritative for that Subject?
    • It seems enough in most cases that the Entity URI includes the authoritative Context URI.
    • Should there be a statement: <contextname> <isauthoritativefor> <entityname> for every Entity in order to cover that cases where Entity URI does not include the authoritative Context URI?
  • Other than the root Person, does the name of an Entity matter as long as it is unique within the scope of the authoritative Context?
    • A simple approach could be <contextname>//<entityname> where entityname is <classname>_<N> where N is a counter scoped to the Context.
      • A statement: <contextname> <entitycounter> <N> would be placed inside each Context.
      • N would be set to 0 initially and incremented each time a new Entity was created.

Naming Conventions

Contexts

Contexts are named according to their scope: Universal, User, Application, Site, User @ Site, User @ Application, User @ User, Application @ Site, etc. Most Context names will be in the form of a URI defined under the PDS. Where possible, the URI will resolve to Linked Data. URIs will be of the form:

  http://<pdsservername>/data?[user=<username>][&][peer=<username>][&][app=<appname>][&][site=<sitename>][&][context=<contextname>]
Universal Contexts

There are a few Contexts that contain definitions of classes and other statements that are required in order for the system to function (e.g. those that define the Persona Data Model). These Contexts are usually defined in OWL/RDF files that are accessible at their Base URI. The statements in these files will be loaded into Contexts named with the Base URI:

Example
  http://www.eclipse.org/higgins/ontologies/2010/6/persona
Note

Universal Contexts are also accessible as Linked Data under an alternate name relative to the PDS.

Example
  http://pds.azigo.com/data?context=(http://www.eclipse.org/higgins/ontologies/2010/6/persona)
User Contexts

For every User with an account, there is exactly one Root Context.

Example
  http://pds.azigo.com/data?user=ptrevithick
User at Site Contexts
Example
  http://pds.azigo.com/data?user=ptrevithick&site=bestbuy.com
Application at Site Contexts
Example
  http://pds.azigo.com/data?app=formfiller&site=bestbuy.com
User at Peer Contexts
Example
  http://pds.azigo.com/data?user=ptrevithick&peer=mmcintosh

Non-Context Entities

URIs will be relative to the authoritative Context URI.

Example
  <contextname>//<entityname>
Root Person

The Root Person in each User's Root Context has a relative name of Person_0.

Example
  http://pds.azigo.com/data?user=ptrevithick//Person_0
Other Entities

TBD

Operations

@@@@ new section that describes some key operations that a PDS might perform and the affect of this operations on example data instances.


See Also

Rough notes; a bit long in the tooth:

Back to the top