Difference between revisions of "COSMOS Design 237921"

Latest revision as of 10:36, 31 July 2008

Support PSVI in SML Validation

Change History

Name:	Date:	Revised Sections:
Ali Mehregani	06/23/2008	Initial creation
David Whiteman	07/30/2008	Added notes to Open Issues section

Workload Estimation

Rough workload estimate in person weeks
Process	Sizing	Names of people doing the work
Design	.5	Ali Mehregani
Code	6	Ali Mehregani / David Whiteman
Test	2.5	Ali Mehregani
Documentation	0
Build and infrastructure	0
Code review, etc.*	0
TOTAL	9

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document.

Term	Definition
SML	Service Modeling Language
SML-IF	Service Modeling Language - Interchange Format
PSVI	Post Schema Validation Infoset

Purpose

This document is associated with bugzilla 237921 and bugzilla 237872.

The purpose of the feature is to use PSVI when constructing structures required for validating SML constraints. Currently the validator parses through each definition document to determine element types, their derivation, and any associated SML constraints. Manual parsing of definition documents negatively impacts performance and memory consumption required by the validator. Another disadvantage of manual parsing is the inability to cover all possible cases. It's preferred to rely on an established interface as opposed to manually parse through each schema document.

What is PSVI?

Post Schema Validation Infoset (PSVI) is the ability to access schema-level information when parsing an XML document. The interface set is available via DOM or SAX. The snippet below describes how the PSVI provider can be retrieved when SAX parsing a document:


SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
saxParserFactory.setFeature("http://apache.org/xml/features/generate-synthetic-annotations", true);
saxParserFactory.setFeature("http://xml.org/sax/features/validation", true);

SAXParser newSaxParser = saxParserFactory.newSAXParser();
newSaxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); 
newSaxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", <LIST OF SCHEMA INPUT>);					
			
PSVIProvider psviProvider = (PSVIProvider)newSaxParser.getXMLReader();

Assuming the presence of psviProvider, an element or an attribute declaration can be retrieved using psviProvider.getElementPSVI() or getAttributePSVI(...). The following snippet of code determines the base type for the current element:


ElementPSVI elementDeclaration = psviProvider.getElementPSVI();
XSTypeDefinition typeDefinition = elementDeclaration.getTypeDefinition();
System.out.println("The base type of the current element is: " + typeDefinition.getBaseType().getName());

This snippet demonstrates the use of PSVI in determining the value of the sml:acyclic attribute. The code retrieves the annotation of the type associated with the current element to determine if the attribute sml:acyclic is set:


ElementPSVI elementDeclaration = psviProvider.getElementPSVI();
XSTypeDefinition typeDefinition = elementDeclaration.getTypeDefinition();
		
XSObjectList annotationList = ((XSComplexTypeDefinition)typeDefinition).getAnnotations();
	
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();            
Document domDocument = factory.newDocumentBuilder().newDocument();
		
for (int i = 0, annotationCount = annotationList.getLength(); i < annotationCount; i++)
{
   XSObject annotation = annotationList.item(i);			
   ((XSAnnotation)annotation).writeAnnotation(domDocument, XSAnnotation.W3C_DOM_DOCUMENT);
   Node acyclicAttribute = domDocument.getFirstChild().getAttributes().getNamedItemNS(ISMLConstants.SML_URI, ISMLConstants.ACYCLIC_ATTRIBUTE);
   if (acyclicAttribute != null)
   {
      System.out.println(acyclicAttribute.getNodeValue());
   }
}

Implementation Detail

All data builders used to construct structures based on definition documents are expected to be replaced with PSVI. There are currently three phases to the validation process:

Constructing the data structures required by each validator
Executing validators to verify SML constraints
Checking schematron constraints

A validator is currently used to register a set of data structures it requires for validating a constraint. The data builders associated with a validator are content handlers that are invoked when parsing through an SML-IF document.

The first and second phases will be affected by this enhancement. The SMLMainValidator will be modified to parse through an SML-IF document using three different content handlers:

HeaderContentHandler
DefinitionContentHandler
InstanceContentHandler

HeaderContentHandler is used to determine the identity, rule binding, and the schema binding of the SML-IF document. The structures build by this content handler are used later during the parsing process to bind definition documents with instance documents. DefinitionContentHandler is used to gather all schemas that are to be used when validating instance documents. InstanceContentHandler schema parses each instance document to build the data structures required for validating the SML constraints. Once the document content is parsed, SMLMainValidator invokes each validator to check the state of each constraint.

Figure 1.1 depicts the validator's flow when processing an SML-IF document:

Figure 1.1 - SML Validator's flow

Data Builders

The following data builders will need to be modified/removed:

AbstractDeclarationBuilder.java - Abstract class for classes such as GroupDeclarationBuilder
AcyclicDataTypesList.java - Extract complex types that have sml:acyclic set to true
ComplexTypeElementBuilder.java - Stores complex type declaration
ElementDeclarationBuilder.java - Stores global element declaration
ElementSchematronCacheBuilder.java - Stores schematron associated with an element/type declaration
ElementTypeMapDataBuilder.java - Stores the relationship between the element names and their associated type.
GroupDeclarationBuilder.java - Stores group declarations
IdentityConstraintDataBuilder.java - Stores the identity constraints associated with elements
SchemaBindingDataBuilder.java - Used for schema binding
ElementSourceBuilder.java - Stores the source for definition/instance documents
SMLValidatingBuilder.java - Stores elements
SubstitutionBuilder - Used for substitution groups
TargetSchemaBuilder.java - Element declarations with target* constraints
TargetSchemaBuilder.java - Stores type declarations
TypeInheritanceDataBuilderImpl.java - Keeps track of type inheritance

Task Breakdown

The following section includes the tasks required to complete this enhancement

Modify SMLMainValidator to invoke the three content handlers
Create HeaderContentHandler
Build the structures for HeaderContentHandler
Create DefinitionContentHandler
Build the structures for DefinitionContentHandler
Create InstanceContentHandler
Use the structures created by HeaderContentHandler and DefinitionContentHandler to invoke the data builders associated with each validator
Build the structures required for sml:acyclic
Complete the validator for acyclic
Build the structures required for target* constraints
Complete the validator for target* constraints
Build the structures required for identity constraints
Complete the validator for identity constraints
Test to make sure all existing test cases pass

References

Tutorial on PSVI API

Open Issues/Questions

How do we handle validating SML-IF documents that contain only schemas?
After conferring with Sandy Gao of the SML workgroup, the following approach was decided:
- For SML-IF documents containing no instance documents, we need to create a dummy element to parse the Schemas with.
- Using PSVI, we can then retrieve an object of type XSModel which basically represents all schemas in a validation set
- The XSModel object can be examined by validators to ensure the schemas are syntactically correct

Here are some clarifications from Sandy on the above summary:

The problem occurs not only when the IF contains only schema (document)s, but also when:
- The IF contains schema documents that are not used to validate any instance document
- The IF contains schema documents that are used to validate instance documents, but some components (e.g. element declarations or type definitions) are not used during the validation process.
Because it's difficult to predict whether all schema components will be used by instance documents, the dummy document should always be used for all schemas (yes, schemas, not schema documents) created according to the schema binding.
The XSModel represents *the schema* (singular) used to validate the instance document. That schema may be constructed from multiple schema documents. (Schema vs. schema document is an important distinction, and a source of confusion.)
When the schema is constructed (as part of the construction of the JAXP schema), all schema constraints are checked. If you didn't receive any error at that step, then when you get the XSModel from PSVI, it's guaranteed to be correct, both syntactically and semantically.
What really need to be checked are SML constraints specified on schema components. Both syntactic and semantic problems about SML constraints should be checked and reported, not only syntactic ones.
It's possible for schema documents to not be used to construct any schema. Such documents still need to be checked to make sure they are good schema documents (syntactically). If there are SML constraints specified in such schema documents, they also need to be checked syntactically. It may be sufficient to validate such documents against a schema constructed from 2 schema documents: the "schema for schemas" and the "schema for SML". The spec needs some clarification on what's the exact requirement.

All reviewer feedback should go in the Talk page for 237921.

@@ Line 10: / Line 10: @@
 |06/23/2008
 |<ul><li>Initial creation</li></ul>
+|-
+|David Whiteman
+|07/30/2008
+|<ul><li>Added notes to Open Issues section</li></ul>
 |}
@@ Line 30: / Line 34: @@
 |-
 | align="left" | Test
-| 1.5
+| 2.5
 | Ali Mehregani
 |-
@@ Line 46: / Line 50: @@
 |-
 ! align="right" | TOTAL
-| 8
+| 9
 |
 |}
@@ Line 52: / Line 56: @@
 == Terminologies/Acronyms ==
-The terminologies/acronyms below are commonly used throughout this document.  The list below defines each term regarding how it is used in this document:
+The terminologies/acronyms below are commonly used throughout this document.
 {|{{BMTableStyle}}
@@ Line 70: / Line 74: @@
 == Purpose ==
-This enhancement is associated with [https://bugs.eclipse.org/bugs/show_bug.cgi?id=237921 bugzilla 237921].
+This document is associated with [https://bugs.eclipse.org/bugs/show_bug.cgi?id=237921 bugzilla 237921] and [https://bugs.eclipse.org/bugs/show_bug.cgi?id=237872 bugzilla 237872].
 <p>
-The purpose of the feature is to add support for PSVI to data builders used in constructing structures required for validation of SML constraints.  Currently the validator parses through each definition document to determine element types, their derivation and any associated SML constraints.  Manual parsing of definition documents negatively impacts performance and memory consumption required by the validator.  Another disadvantage of manual parsing is the inability to cover all possible cases.  There are unpredictable methods of how schema can be used.  It's preferred to rely on an established interface as opposed to manually parse through each schema document.
+The purpose of the feature is to use PSVI when constructing structures required for validating SML constraints.  Currently the validator parses through each definition document to determine element types, their derivation, and any associated SML constraints.  Manual parsing of definition documents negatively impacts performance and memory consumption required by the validator.  Another disadvantage of manual parsing is the inability to cover all possible cases.  It's preferred to rely on an established interface as opposed to manually parse through each schema document.
 </p>
@@ Line 102: / Line 106: @@
 </code>
-The following code is used to determine if sml:acyclic is set on the current element.  The code retrieves the annotation of the type associated with the current element to determine if the attribute sml:acyclic is set:
+This snippet demonstrates the use of PSVI in determining the value of the sml:acyclic attribute.  The code retrieves the annotation of the type associated with the current element to determine if the attribute sml:acyclic is set:
 <code>
@@ Line 135: / Line 139: @@
 # Checking schematron constraints
-A validator is currently used to register a set of data structures it requires for validating a constraint.  The data builder associated with a validator are content handlers that are invoked when parsing through an SML-IF document.
+A validator is currently used to register a set of data structures it requires for validating a constraint.  The data builders associated with a validator are content handlers that are invoked when parsing through an SML-IF document.
-The first and second phase will be affected by this enhancement.  The SMLMainValidator will be modified to parse through an SML-IF document using three different content handlers:
+The first and second phases will be affected by this enhancement.  The SMLMainValidator will be modified to parse through an SML-IF document using three different content handlers:
 # HeaderContentHandler
@@ Line 143: / Line 147: @@
 # InstanceContentHandler
-HeaderContentHandler is used to determine the identity, rule binding, and the schema binding of the SML-IF document.  The structures build by this content handler is used later during the parsing process to bind definition documents with instance documents.
+HeaderContentHandler is used to determine the identity, rule binding, and the schema binding of the SML-IF document.  The structures build by this content handler are used later during the parsing process to bind definition documents with instance documents.  DefinitionContentHandler is used to gather all schemas that are to be used when validating instance documents.  InstanceContentHandler schema parses each instance document to build the data structures required for validating the SML constraints.  Once the document content is parsed, SMLMainValidator invokes each validator to check the state of each constraint.
-DefinitionContentHandler is used to gather all schemas that are to be used when validating instance documents.
+Figure 1.1 depicts the validator's flow when processing an SML-IF document:
+<br/><br/>
-InstanceContentHandler schema parses each instance document to build the data structures required for validating the SML constraints.
+[[Image:237921-0.png]] <br/>
+<font size='.1'><b>Figure 1.1 - SML Validator's flow</b></font>
-Once the document content is parsed, SMLMainValidator invokes each validator to check the state of each constraint.
+=== Data Builders ===
-=== Graphical View ===
+The following data builders will need to be modified/removed:
-The top most section of the view will consist of a graphical representation of the response.  The section will leverage dojox.gfx APIs along with atomic shapes to render a graph.  The APIs use SVG in FireFox and VML in IE to render the individual shapes.  The end result is a static diagram with cross-browser support.
+* AbstractDeclarationBuilder.java - Abstract class for classes such as GroupDeclarationBuilder
+* AcyclicDataTypesList.java - Extract complex types that have sml:acyclic set to true
-The image below shows a sample of what the graph will look like:
+* ComplexTypeElementBuilder.java - Stores complex type declaration
+* ElementDeclarationBuilder.java - Stores global element declaration
-<p>
+* ElementSchematronCacheBuilder.java - Stores schematron associated with an element/type declaration
-[[Image: graph_response_visual.png]]
+* ElementTypeMapDataBuilder.java - Stores the relationship between the element names and their associated type.
-</p>
+* GroupDeclarationBuilder.java - Stores group declarations
+* IdentityConstraintDataBuilder.java - Stores the identity constraints associated with elements
-The graph layout algorithm renders adjacent nodes closer to each other.  This will visually indicate entities that are closer together in relation.  The radius of each node is determined by its order (i.e. outgoing edges from the node) and the color of the node is dependent on the template it belongs to.  Notice a legend is displayed at the bottom of the diagram.
+* SchemaBindingDataBuilder.java - Used for schema binding
+* ElementSourceBuilder.java - Stores the source for definition/instance documents
-The graph layout algorithm is not sufficient for responses with many items or relationships.  The graph is best suited for 30 or less nodes with a maximum node order of 3.  The graph view will not be generated if this condition is not met.  A warning will be displayed and the space will be reserved for the logical representation (see next section).  Users will have the ability to discard the warning and display the graph if they choose to populate the view.
+* SMLValidatingBuilder.java - Stores elements
+* SubstitutionBuilder - Used for substitution groups
+* TargetSchemaBuilder.java - Element declarations with target* constraints
-The image below displays a concrete example based on data from the student-teacher MDR.  The value of this graphical overview is the ability to quickly identify relationships between two different entities.  Using this diagram, the user is quickly able to identify the following relationships:
+* TargetSchemaBuilder.java - Stores type declarations
+* TypeInheritanceDataBuilderImpl.java - Keeps track of type inheritance
-* Staff01 teaches Student01 and Student03
-* Staff02 teaches Student01, Student02, Student03
-<p>
-[[Image: graph_response_visual2.png]]
-</p>
-=== Logical View ===
-A tree structure is provided to allow traversal of the response elements.  Unlike a grid structure, a tree will only occupy the space needed.  The properties view will be used to display any fields associated with a selected tree node.  The image below provides an overview of the tree structure:
-<p>
-[[Image:graph_response_logical.png]]
-</p>
-Each level of the tree is described below:
-* First level: item/relationship template ID
-* Second level: the first item/relationship local ID
-* Third level: record ID
-The properties view will be populated based on selection of nodes.  The following fields will be displayed for each tree item selected:
-{|{{BMTableStyle}}
-!align="left"|Tree Item:
-!align="left"|Fields:
-|-
-|Item/Relationship template
-|No fields will be displayed
-|-
-|Item/Relationship
-|
-* All instance IDs
-* Additional record types
-* For relationships, source and target will also be displayed
-|-
-|Records
-|
-* Last modified
-* Snapshot ID
-* Base line ID
-|}
-Double clicking a record will open a dialog box with the XML content of the associated record.
 == Task Breakdown ==
@@ Line 217: / Line 179: @@
 The following section includes the tasks required to complete this enhancement
-# Isolate the query response view as a separate test page (the one that currently exists is outdated) <font color="green">[complete]</font>
+# Modify SMLMainValidator to invoke the three content handlers
-# Define a structure to be used by the graph layout algorithm <font color="green">[complete]</font>
+# Create HeaderContentHandler
-# Modify outputters to generate the structure <font color="green">[complete]</font>
+# Build the structures for HeaderContentHandler
-# Create the graph layout algorithm <font color="green">[complete]</font>
+# Create DefinitionContentHandler
-# Write code to render the graph <font color="green">[complete]</font>
+# Build the structures for DefinitionContentHandler
-# Define a structure to be used by the logical tree structure (determine if the same structure as the graph layout can be used) <font color="green">[complete]</font>
+# Create InstanceContentHandler
-# Modify outputters to generate the structure <font color="green">[complete]</font>
+# Use the structures created by HeaderContentHandler and DefinitionContentHandler to invoke the data builders associated with each validator
-# Create a tree based on the structure <font color="green">[complete]</font>
+# Build the structures required for sml:acyclic
-# Decorate the tree items with icons
+# Complete the validator for acyclic
-# Use topic publication/subscription to detect tree item selections <font color="green">[complete]</font>
+# Build the structures required for target* constraints
-# Populate the properties view based on the selection <font color="green">[complete]</font>
+# Complete the validator for target* constraints
-# Create the overall layout of the view <font color="green">[complete]</font>
+# Build the structures required for identity constraints
-# Add the graph and logical component to the main view <font color="green">[complete]</font>
+# Complete the validator for identity constraints
-# Determine how to add this as an alternative view to displaying the CMDBf query response
+# Test to make sure all existing test cases pass
-== Future Direction ==
+== References ==
-* Showing direction of relationships in the graph view
+* [http://www.idealliance.org/papers/dx_xmle04/papers/02-05-02/02-05-02.html#s4.2.2 Tutorial on PSVI API]
-* Displaying edges only when user hovers over a node
-* Adding additional interaction (e.g. display information when hovering over a node)
-* Once COSMOS has the capability of adding user profiles, add the ability to disable parts of the view users may not be interested in.
 == Open Issues/Questions ==
-All reviewer feedback should go in the [[Talk:COSMOS_Design_224166|Talk page for 224166]].
+* How do we handle validating SML-IF documents that contain only schemas?<br/>After conferring with Sandy Gao of the SML workgroup, the following approach was decided:
+** For SML-IF documents containing no instance documents, we need to create a dummy element to parse the Schemas with.
+** Using PSVI, we can then retrieve an object of type XSModel which basically represents all schemas in a validation set
+** The XSModel object can be examined by validators to ensure the schemas are syntactically correct
+Here are some clarifications from Sandy on the above summary:
+* The problem occurs not only when the IF contains only schema (document)s, but also when:
+** The IF contains schema documents that are not used to validate any instance document
+** The IF contains schema documents that are used to validate instance documents, but some components (e.g. element declarations or type definitions) are not used during the validation process.
+* Because it's difficult to predict whether all schema components will be used by instance documents, the dummy document should always be used for all schemas (yes, schemas, not schema documents) created according to the schema binding.
+* The XSModel represents *the schema* (singular) used to validate the instance document. That schema may be constructed from multiple schema documents. (Schema vs. schema document is an important distinction, and a source of confusion.)
+* When the schema is constructed (as part of the construction of the JAXP schema), all schema constraints are checked. If you didn't receive any error at that step, then when you get the XSModel from PSVI, it's guaranteed to be correct, both syntactically and semantically.
+* What really need to be checked are SML constraints specified on schema components. Both syntactic and semantic problems about SML constraints should be checked and reported, not only syntactic ones.
+* It's possible for schema documents to not be used to construct any schema. Such documents still need to be checked to make sure they are good schema documents (syntactically). If there are SML constraints specified in such schema documents, they also need to be checked syntactically. It may be sufficient to validate such documents against a schema constructed from 2 schema documents: the "schema for schemas" and the "schema for SML". The spec needs some clarification on what's the exact requirement.
+All reviewer feedback should go in the [[Talk:COSMOS_Design_237921|Talk page for 237921]].
 ----
 [[Category:COSMOS_Bugzilla_Designs]]

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "COSMOS Design 237921"

Latest revision as of 10:36, 31 July 2008

Contents

Support PSVI in SML Validation

Change History

Workload Estimation

Terminologies/Acronyms

Purpose

What is PSVI?

Implementation Detail

Data Builders

Task Breakdown

References

Open Issues/Questions

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "COSMOS Design 237921"

Latest revision as of 10:36, 31 July 2008

Contents

Support PSVI in SML Validation

Change History

Workload Estimation

Terminologies/Acronyms

Purpose

What is PSVI?

Implementation Detail

Data Builders

Task Breakdown

References

Open Issues/Questions