COSMOS Design 237921

From Eclipsepedia

Jump to: navigation, search

Contents

Support PSVI in SML Validation

Change History

Name: Date: Revised Sections:
Ali Mehregani 06/23/2008
  • Initial creation
David Whiteman 07/30/2008
  • Added notes to Open Issues section

Workload Estimation

Rough workload estimate in person weeks
Process Sizing Names of people doing the work
Design .5 Ali Mehregani
Code 6 Ali Mehregani / David Whiteman
Test 2.5 Ali Mehregani
Documentation 0
Build and infrastructure 0
Code review, etc.* 0
TOTAL 9

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document.

Term Definition
SML Service Modeling Language
SML-IF Service Modeling Language - Interchange Format
PSVI Post Schema Validation Infoset

Purpose

This document is associated with bugzilla 237921 and bugzilla 237872.

The purpose of the feature is to use PSVI when constructing structures required for validating SML constraints. Currently the validator parses through each definition document to determine element types, their derivation, and any associated SML constraints. Manual parsing of definition documents negatively impacts performance and memory consumption required by the validator. Another disadvantage of manual parsing is the inability to cover all possible cases. It's preferred to rely on an established interface as opposed to manually parse through each schema document.

What is PSVI?

Post Schema Validation Infoset (PSVI) is the ability to access schema-level information when parsing an XML document. The interface set is available via DOM or SAX. The snippet below describes how the PSVI provider can be retrieved when SAX parsing a document:

SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
saxParserFactory.setFeature("http://apache.org/xml/features/generate-synthetic-annotations", true);
saxParserFactory.setFeature("http://xml.org/sax/features/validation", true);

SAXParser newSaxParser = saxParserFactory.newSAXParser();
newSaxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); 
newSaxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", <LIST OF SCHEMA INPUT>);					
			
PSVIProvider psviProvider = (PSVIProvider)newSaxParser.getXMLReader();

Assuming the presence of psviProvider, an element or an attribute declaration can be retrieved using psviProvider.getElementPSVI() or getAttributePSVI(...). The following snippet of code determines the base type for the current element:

ElementPSVI elementDeclaration = psviProvider.getElementPSVI();
XSTypeDefinition typeDefinition = elementDeclaration.getTypeDefinition();
System.out.println("The base type of the current element is: " + typeDefinition.getBaseType().getName());

This snippet demonstrates the use of PSVI in determining the value of the sml:acyclic attribute. The code retrieves the annotation of the type associated with the current element to determine if the attribute sml:acyclic is set:

ElementPSVI elementDeclaration = psviProvider.getElementPSVI();
XSTypeDefinition typeDefinition = elementDeclaration.getTypeDefinition();
		
XSObjectList annotationList = ((XSComplexTypeDefinition)typeDefinition).getAnnotations();
	
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();            
Document domDocument = factory.newDocumentBuilder().newDocument();
		
for (int i = 0, annotationCount = annotationList.getLength(); i < annotationCount; i++)
{
   XSObject annotation = annotationList.item(i);			
   ((XSAnnotation)annotation).writeAnnotation(domDocument, XSAnnotation.W3C_DOM_DOCUMENT);
   Node acyclicAttribute = domDocument.getFirstChild().getAttributes().getNamedItemNS(ISMLConstants.SML_URI, ISMLConstants.ACYCLIC_ATTRIBUTE);
   if (acyclicAttribute != null)
   {
      System.out.println(acyclicAttribute.getNodeValue());
   }
}			

Implementation Detail

All data builders used to construct structures based on definition documents are expected to be replaced with PSVI. There are currently three phases to the validation process:

  1. Constructing the data structures required by each validator
  2. Executing validators to verify SML constraints
  3. Checking schematron constraints

A validator is currently used to register a set of data structures it requires for validating a constraint. The data builders associated with a validator are content handlers that are invoked when parsing through an SML-IF document.

The first and second phases will be affected by this enhancement. The SMLMainValidator will be modified to parse through an SML-IF document using three different content handlers:

  1. HeaderContentHandler
  2. DefinitionContentHandler
  3. InstanceContentHandler

HeaderContentHandler is used to determine the identity, rule binding, and the schema binding of the SML-IF document. The structures build by this content handler are used later during the parsing process to bind definition documents with instance documents. DefinitionContentHandler is used to gather all schemas that are to be used when validating instance documents. InstanceContentHandler schema parses each instance document to build the data structures required for validating the SML constraints. Once the document content is parsed, SMLMainValidator invokes each validator to check the state of each constraint.

Figure 1.1 depicts the validator's flow when processing an SML-IF document:

237921-0.png
Figure 1.1 - SML Validator's flow

Data Builders

The following data builders will need to be modified/removed:

  • AbstractDeclarationBuilder.java - Abstract class for classes such as GroupDeclarationBuilder
  • AcyclicDataTypesList.java - Extract complex types that have sml:acyclic set to true
  • ComplexTypeElementBuilder.java - Stores complex type declaration
  • ElementDeclarationBuilder.java - Stores global element declaration
  • ElementSchematronCacheBuilder.java - Stores schematron associated with an element/type declaration
  • ElementTypeMapDataBuilder.java - Stores the relationship between the element names and their associated type.
  • GroupDeclarationBuilder.java - Stores group declarations
  • IdentityConstraintDataBuilder.java - Stores the identity constraints associated with elements
  • SchemaBindingDataBuilder.java - Used for schema binding
  • ElementSourceBuilder.java - Stores the source for definition/instance documents
  • SMLValidatingBuilder.java - Stores elements
  • SubstitutionBuilder - Used for substitution groups
  • TargetSchemaBuilder.java - Element declarations with target* constraints
  • TargetSchemaBuilder.java - Stores type declarations
  • TypeInheritanceDataBuilderImpl.java - Keeps track of type inheritance

Task Breakdown

The following section includes the tasks required to complete this enhancement

  1. Modify SMLMainValidator to invoke the three content handlers
  2. Create HeaderContentHandler
  3. Build the structures for HeaderContentHandler
  4. Create DefinitionContentHandler
  5. Build the structures for DefinitionContentHandler
  6. Create InstanceContentHandler
  7. Use the structures created by HeaderContentHandler and DefinitionContentHandler to invoke the data builders associated with each validator
  8. Build the structures required for sml:acyclic
  9. Complete the validator for acyclic
  10. Build the structures required for target* constraints
  11. Complete the validator for target* constraints
  12. Build the structures required for identity constraints
  13. Complete the validator for identity constraints
  14. Test to make sure all existing test cases pass

References

Open Issues/Questions

  • How do we handle validating SML-IF documents that contain only schemas?
    After conferring with Sandy Gao of the SML workgroup, the following approach was decided:
    • For SML-IF documents containing no instance documents, we need to create a dummy element to parse the Schemas with.
    • Using PSVI, we can then retrieve an object of type XSModel which basically represents all schemas in a validation set
    • The XSModel object can be examined by validators to ensure the schemas are syntactically correct

Here are some clarifications from Sandy on the above summary:

  • The problem occurs not only when the IF contains only schema (document)s, but also when:
    • The IF contains schema documents that are not used to validate any instance document
    • The IF contains schema documents that are used to validate instance documents, but some components (e.g. element declarations or type definitions) are not used during the validation process.
  • Because it's difficult to predict whether all schema components will be used by instance documents, the dummy document should always be used for all schemas (yes, schemas, not schema documents) created according to the schema binding.
  • The XSModel represents *the schema* (singular) used to validate the instance document. That schema may be constructed from multiple schema documents. (Schema vs. schema document is an important distinction, and a source of confusion.)
  • When the schema is constructed (as part of the construction of the JAXP schema), all schema constraints are checked. If you didn't receive any error at that step, then when you get the XSModel from PSVI, it's guaranteed to be correct, both syntactically and semantically.
  • What really need to be checked are SML constraints specified on schema components. Both syntactic and semantic problems about SML constraints should be checked and reported, not only syntactic ones.
  • It's possible for schema documents to not be used to construct any schema. Such documents still need to be checked to make sure they are good schema documents (syntactically). If there are SML constraints specified in such schema documents, they also need to be checked syntactically. It may be sufficient to validate such documents against a schema constructed from 2 schema documents: the "schema for schemas" and the "schema for SML". The spec needs some clarification on what's the exact requirement.

All reviewer feedback should go in the Talk page for 237921.