COSMOS Design 237921

Support PSVI in SML Validation

Change History

Name:	Date:	Revised Sections:
Ali Mehregani	06/23/2008	Initial creation
David Whiteman	07/30/2008	Added notes to Open Issues section

Workload Estimation

Rough workload estimate in person weeks
Process	Sizing	Names of people doing the work
Design	.5	Ali Mehregani
Code	6	Ali Mehregani / David Whiteman
Test	2.5	Ali Mehregani
Documentation	0
Build and infrastructure	0
Code review, etc.*	0
TOTAL	9

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document.

Term	Definition
SML	Service Modeling Language
SML-IF	Service Modeling Language - Interchange Format
PSVI	Post Schema Validation Infoset

Purpose

This document is associated with bugzilla 237921 and bugzilla 237872.

The purpose of the feature is to use PSVI when constructing structures required for validating SML constraints. Currently the validator parses through each definition document to determine element types, their derivation, and any associated SML constraints. Manual parsing of definition documents negatively impacts performance and memory consumption required by the validator. Another disadvantage of manual parsing is the inability to cover all possible cases. It's preferred to rely on an established interface as opposed to manually parse through each schema document.

What is PSVI?

Post Schema Validation Infoset (PSVI) is the ability to access schema-level information when parsing an XML document. The interface set is available via DOM or SAX. The snippet below describes how the PSVI provider can be retrieved when SAX parsing a document:


SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
saxParserFactory.setFeature("http://apache.org/xml/features/generate-synthetic-annotations", true);
saxParserFactory.setFeature("http://xml.org/sax/features/validation", true);

SAXParser newSaxParser = saxParserFactory.newSAXParser();
newSaxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); 
newSaxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", <LIST OF SCHEMA INPUT>);					
			
PSVIProvider psviProvider = (PSVIProvider)newSaxParser.getXMLReader();

Assuming the presence of psviProvider, an element or an attribute declaration can be retrieved using psviProvider.getElementPSVI() or getAttributePSVI(...). The following snippet of code determines the base type for the current element:


ElementPSVI elementDeclaration = psviProvider.getElementPSVI();
XSTypeDefinition typeDefinition = elementDeclaration.getTypeDefinition();
System.out.println("The base type of the current element is: " + typeDefinition.getBaseType().getName());

This snippet demonstrates the use of PSVI in determining the value of the sml:acyclic attribute. The code retrieves the annotation of the type associated with the current element to determine if the attribute sml:acyclic is set:


ElementPSVI elementDeclaration = psviProvider.getElementPSVI();
XSTypeDefinition typeDefinition = elementDeclaration.getTypeDefinition();
		
XSObjectList annotationList = ((XSComplexTypeDefinition)typeDefinition).getAnnotations();
	
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();            
Document domDocument = factory.newDocumentBuilder().newDocument();
		
for (int i = 0, annotationCount = annotationList.getLength(); i < annotationCount; i++)
{
   XSObject annotation = annotationList.item(i);			
   ((XSAnnotation)annotation).writeAnnotation(domDocument, XSAnnotation.W3C_DOM_DOCUMENT);
   Node acyclicAttribute = domDocument.getFirstChild().getAttributes().getNamedItemNS(ISMLConstants.SML_URI, ISMLConstants.ACYCLIC_ATTRIBUTE);
   if (acyclicAttribute != null)
   {
      System.out.println(acyclicAttribute.getNodeValue());
   }
}

Implementation Detail

All data builders used to construct structures based on definition documents are expected to be replaced with PSVI. There are currently three phases to the validation process:

Constructing the data structures required by each validator
Executing validators to verify SML constraints
Checking schematron constraints

A validator is currently used to register a set of data structures it requires for validating a constraint. The data builders associated with a validator are content handlers that are invoked when parsing through an SML-IF document.

The first and second phases will be affected by this enhancement. The SMLMainValidator will be modified to parse through an SML-IF document using three different content handlers:

HeaderContentHandler
DefinitionContentHandler
InstanceContentHandler

HeaderContentHandler is used to determine the identity, rule binding, and the schema binding of the SML-IF document. The structures build by this content handler are used later during the parsing process to bind definition documents with instance documents. DefinitionContentHandler is used to gather all schemas that are to be used when validating instance documents. InstanceContentHandler schema parses each instance document to build the data structures required for validating the SML constraints. Once the document content is parsed, SMLMainValidator invokes each validator to check the state of each constraint.

Figure 1.1 depicts the validator's flow when processing an SML-IF document:

Figure 1.1 - SML Validator's flow

Data Builders

The following data builders will need to be modified/removed:

AbstractDeclarationBuilder.java - Abstract class for classes such as GroupDeclarationBuilder
AcyclicDataTypesList.java - Extract complex types that have sml:acyclic set to true
ComplexTypeElementBuilder.java - Stores complex type declaration
ElementDeclarationBuilder.java - Stores global element declaration
ElementSchematronCacheBuilder.java - Stores schematron associated with an element/type declaration
ElementTypeMapDataBuilder.java - Stores the relationship between the element names and their associated type.
GroupDeclarationBuilder.java - Stores group declarations
IdentityConstraintDataBuilder.java - Stores the identity constraints associated with elements
SchemaBindingDataBuilder.java - Used for schema binding
ElementSourceBuilder.java - Stores the source for definition/instance documents
SMLValidatingBuilder.java - Stores elements
SubstitutionBuilder - Used for substitution groups
TargetSchemaBuilder.java - Element declarations with target* constraints
TargetSchemaBuilder.java - Stores type declarations
TypeInheritanceDataBuilderImpl.java - Keeps track of type inheritance

Task Breakdown

The following section includes the tasks required to complete this enhancement

Modify SMLMainValidator to invoke the three content handlers
Create HeaderContentHandler
Build the structures for HeaderContentHandler
Create DefinitionContentHandler
Build the structures for DefinitionContentHandler
Create InstanceContentHandler
Use the structures created by HeaderContentHandler and DefinitionContentHandler to invoke the data builders associated with each validator
Build the structures required for sml:acyclic
Complete the validator for acyclic
Build the structures required for target* constraints
Complete the validator for target* constraints
Build the structures required for identity constraints
Complete the validator for identity constraints
Test to make sure all existing test cases pass

References

Tutorial on PSVI API

Open Issues/Questions

How do we handle validating SML-IF documents that contain only schemas?
After conferring with Sandy Gao of the SML workgroup, the following approach was decided:
- For SML-IF documents containing no instance documents, we need to create a dummy element to parse the Schemas with.
- Using PSVI, we can then retrieve an object of type XSModel which basically represents all schemas in a validation set
- The XSModel object can be examined by validators to ensure the schemas are syntactically correct

Here are some clarifications from Sandy on the above summary:

The problem occurs not only when the IF contains only schema (document)s, but also when:
- The IF contains schema documents that are not used to validate any instance document
- The IF contains schema documents that are used to validate instance documents, but some components (e.g. element declarations or type definitions) are not used during the validation process.
Because it's difficult to predict whether all schema components will be used by instance documents, the dummy document should always be used for all schemas (yes, schemas, not schema documents) created according to the schema binding.
The XSModel represents *the schema* (singular) used to validate the instance document. That schema may be constructed from multiple schema documents. (Schema vs. schema document is an important distinction, and a source of confusion.)
When the schema is constructed (as part of the construction of the JAXP schema), all schema constraints are checked. If you didn't receive any error at that step, then when you get the XSModel from PSVI, it's guaranteed to be correct, both syntactically and semantically.
What really need to be checked are SML constraints specified on schema components. Both syntactic and semantic problems about SML constraints should be checked and reported, not only syntactic ones.
It's possible for schema documents to not be used to construct any schema. Such documents still need to be checked to make sure they are good schema documents (syntactically). If there are SML constraints specified in such schema documents, they also need to be checked syntactically. It may be sufficient to validate such documents against a schema constructed from 2 schema documents: the "schema for schemas" and the "schema for SML". The spec needs some clarification on what's the exact requirement.

All reviewer feedback should go in the Talk page for 237921.

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

COSMOS Design 237921

Contents

Support PSVI in SML Validation

Change History

Workload Estimation

Terminologies/Acronyms

Purpose

What is PSVI?

Implementation Detail

Data Builders

Task Breakdown

References

Open Issues/Questions

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

COSMOS Design 237921

Contents

Support PSVI in SML Validation

Change History

Workload Estimation

Terminologies/Acronyms

Purpose

What is PSVI?

Implementation Detail

Data Builders

Task Breakdown

References

Open Issues/Questions