Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "COSMOS Design 238000"

(Workload Estimation)
(Open Issues/Questions)
 
(18 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
|06/24/2008
 
|06/24/2008
 
|<ul><li>Initial creation</li></ul>
 
|<ul><li>Initial creation</li></ul>
 +
|-
 +
|Hubert Leung
 +
|07/22/2008
 +
|<ul><li>Completed design</li></ul>
 
|}
 
|}
  
Line 23: Line 27:
 
| align="left" | Design
 
| align="left" | Design
 
| .5
 
| .5
| David Whiteman
+
| Hubert Leung
 
|-
 
|-
 
| align="left" | Code
 
| align="left" | Code
| 2*
+
| 2
| David Whiteman
+
| Hubert Leung
 
|-
 
|-
 
| align="left" | Test
 
| align="left" | Test
 
| 1
 
| 1
| David Whiteman
+
| Hubert Leung
 
|-
 
|-
 
| align="left" | Documentation
 
| align="left" | Documentation
Line 41: Line 45:
 
|
 
|
 
|-
 
|-
| align="left" | Code review, etc.*
+
| align="left" | Code review, etc.
 
| 0
 
| 0
 
|
 
|
Line 49: Line 53:
 
|
 
|
 
|}
 
|}
 
*-- sizing assumes current RM team member
 
  
 
== Terminologies/Acronyms ==
 
== Terminologies/Acronyms ==
Line 73: Line 75:
 
We need to implement the optional sml:locid attibute in our validator.
 
We need to implement the optional sml:locid attibute in our validator.
  
What we need to do here is to support one test of using locid to get localized message strings.  E.g. use the value to point at a java resource bundle.  The namespace URI portion of the locid value names the directory, the ncname portion has part of the file name, and the client locale gives you the rest of the filename.  The mapping from locid value to locale-based string is not defined in the specification, so we have complete freedom here in terms of how the mapping is done.
+
The locid attribute is introduced in SML 1.1 to provide the capability to retrieve localized strings for text elements within an SML document.  The specification provided an example for using the locid in schematron expressions to provide localized error messages.  This enhancement implements the support of locid in schematron expressions for the SML validator.
 +
==Requirements==
 +
 
 +
The locid attribute is for providing information necessary to retrieve the localized text.  The specification is not technology dependent.  Our implementation will use the Java resource bundle to retrieve localized strings.   
 +
 
 +
The enhancement implements the example provided in Appendix F of the SML specification:  http://www.w3.org/TR/sml/#LocalizationSample
 +
 
 +
The implementation will only handle the sml:locid attribute value that is defined in an element with the schematron namespace to provide localized validation error messages.  The locid attribute defined in other contexts will not be handled.
 +
 
 +
==Design==
 +
 
 +
=== Locating the resource bundle ===
 +
The sml:locid attribute will have two parts: a prefix and a key to a string.<br>
 +
e.g. sml:locid="lang:StudentIDErrorMsg"
 +
 
 +
There is a namespace URI associated with the prefix, defined in one of the parent elements.  The format of the URI and how to use the URI to locate the translated resource is out of the scope of the SML specification.  So it is an application specific design decision on how to use the URI to locate the translated resources and retrieve the appropriate value.  In this implementation, we require the URI to be formatted in the following structure:
 +
 
 +
sml:<bundle name>[:<locale>]
 +
* the first segment "sml" is the scheme of the URI.  It is a dummy value to make the URI a well formed absolute URI. 
 +
* <bundle name> is the fully qualified name of a Java resource bundle
 +
* <locale> is the intended locale of the messageIt is an optional field. 
 +
 
 +
Notes:
 +
* URIs used in namespaces have to be in absolute form.  Relative URIs are not allowed.
 +
* The URI format above complies with the URI syntax defined here: http://www.ietf.org/rfc/rfc2396.txt
 +
 
 +
Examples:
 +
*sml:org.eclipse.cosmos.rm.internal.messages.Message
 +
*sml:org.eclipse.cosmos.rm.internal.messages.Message:fr
 +
*sml:org.eclipse.cosmos.rm.internal.messages.Message:pt_BR
 +
 
 +
The string retrieved from resource bundle will replace the text content of the element, if present.  The following two schematron rules are equivalent, assuming the string retrieval from resource bundle is successful:
 +
<pre>
 +
<sch:rule context="u:Students/u:Student">
 +
  <sch:assert test="smlfn:deref(.)[starts-with(u:ID,'99')]"
 +
              sml:locid="lang:StudentIDErrorMsg">
 +
    The specified ID <sch:value-of select="string(u:ID)"/> does not begin with 99.
 +
  </sch:assert>
 +
</sch:rule>
 +
</pre>
 +
 
 +
<pre>
 +
<sch:rule context="u:Students/u:Student">
 +
  <sch:assert test="smlfn:deref(.)[starts-with(u:ID,'99')]"
 +
              sml:locid="lang:StudentIDErrorMsg">
 +
  </sch:assert>
 +
</sch:rule>
 +
</pre>
 +
 
 +
=== String substitution ===
 +
Section 7.1 and Appendix F of the SML specification discusses the use case of string substitution in localized strings.  However, the sml:locid attribute does not provide information on string substitution.  The SML specification only suggests ways to do string substitution, but it is not a normative part of the specification. 
 +
 
 +
The example in Appendix F of the specification embeds the schematron "value-of" element in the message to do string substitutionThis implementation will follow the example closely.
 +
 
 +
 
 +
=== Algorithm ===
 +
The enhancement will change ElementSchematronCacheBuilder data builder to replace text elements with a translated version before passing the schematron expression to the XSLT transformer. 
 +
 
 +
*In the <code>startElement</code> method of ElementSchematronCacheBuilder.java, check for the presence of the sml:locid attribute if the element has the schematron namespace. 
 +
* If the sml:locid attribute is present, attempt to retrieve the value indicated in the locid attribute from a resource bundle. 
 +
** get prefix and message key from the attribute value
 +
** look up the namespace associated with the prefix  (some new data structures are required to do this.  SAX parsers do not provide prefix lookup directly.) 
 +
** parse the namespace URI for bundle name and the optional locale value
 +
** load the resource bundle and retrieve string by message key
 +
* If the retrieval failed, the sml:locid value will be ignored. 
 +
* If the retrieval is successful, then
 +
** append the string from resource bundle to the rule fragment, right after the openning element tag of the current element. 
 +
** set a flag to suppress the text element and <sch:value-of> elements from being appended to the rule fragment. 
 +
** unset the flag in the <code>endElement</code> event of the element with the sml:locid attribute defined.
 +
* Strings in resource bundles need to embed variables for string substitution in the messages in the correct syntax to be consumed by the schematron XSLT transformer.
  
 
== Open Issues/Questions ==
 
== Open Issues/Questions ==
 +
* String substitution is an important part of string localization, but it is not supported by the sml:locid attribute.  So SML can only claim partial support to localization. 
 +
** I see the two as orthogonal.  Ordinarily localized strings are translated, and substituted text is not translated (it comes from some user, and has a fixed language implicitly).  E.g. if I give you a server system name, glyph issues notwithstanding the name should be identical regardless of the application's locale.
 +
* Since the mechanism for retrieving localized string is not standardized, and we have to use implementation-specific ways to handle string substitution, SML documents with localized strings are not interoperable between different implementations of validators and applications that handle the SML documents. 
 +
** Continuing my tradition of treating them separately,
 +
** Localization: guilty as charged.  Until there is a uniform interface across platforms for localization, I see little opportunity to do better than this.  Allowing localization on certain platforms, e.g. Java using resource bundles, is a far better situation for users than no localization at all IMO.
 +
** Variable substitution: in some contexts, e.g. Schematron rules, there appear to be mechanisms that are consistent across platforms... the Schematron spec requires support for an XSLT-based engine in all implementations.  In other contexts, e.g. an SML-IF model's displayName (i.e. in a pure XML context) it is less clear that consistent mechanisms exist, granted.  As with localization, the spec authors chose a partial solution over no solution.
 +
* The specification suggests to use the <code>xsl:variable</code> to do string substitution (Appendix F).  However, this mechanism does not work with the schematron XSL translator from http://xml.ascc.net/schematron/1.5/.  The translator reports the following error:
 +
<code>
 +
Line #0; Column #0; org.apache.xml.utils.WrappedRuntimeException:
 +
Could not find variable with the name of var</code>
 +
Note that it doesn't work even without the sml:locid attribute. 
 +
<pre>
 +
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
 +
            xmlns:lang="http://www.university.example.org/translation/">
 +
    <sch:ns prefix="u" uri="http://www.university.example.org/ns" />
 +
    <sch:ns prefix="smlfn" uri="http://www.w3.org/2008/03/sml-function"/>
 +
    <sch:pattern id="StudentPattern”>
 +
        <sch:rule context="u:Students/u:Student">
 +
            <sch:assert test="smlfn:deref(.)[starts-with(u:ID,'99')]">
 +
                <xsl:variable name="var” select=”u:ID” />
 +
                The specified ID <sch:value-of select="string($var)"/> does not begin with 99.
 +
            </sch:assert>
 +
        </sch:rule>
 +
    </sch:pattern>
 +
</sch:schema>
 +
</pre>
 +
In my implementation, I expect resource bundles to have the following content:
 +
StudentIDErrorMsg = L'identifieur specifie <sch:value-of select="string(u:ID)"/> ne commence pas par 99.
 +
StudentIDErrorMsg = Das angegebene Attributkennzeichen ID <sch:value-of select="string(u:ID)"/> beginnt nicht mit 99.
 +
* We need to assess whether the error is a consequence of a flawed implementation, a down-level implementation (1.5 is not the ISO version IIRC), or a limitation of Schematron as currently specified.
 +
 +
In my opinion, hard coding the variable name "var" is not much better than assuming the prefix to be "u".  So the recommendation to use an xsl variable for its portability is arguable.  Java resource bundles use integers to indicate the indexes of the list of parameters, which is a more portable solution.  For example:
 +
StudentIDErrorMsg = L'identifieur specifie {0} ne commence pas par 99.
 +
StudentIDErrorMsg = Das angegebene Attributkennzeichen ID {0} beginnt nicht mit 99.
 +
* I don't see "var" versus "0" as a meaningful difference.  I would recommend named variables because they are named (the names presumably convey semantic meaning) and because they are insensitive to the insertion of additional preceding variables in the string (not true of positional parameters), i.e. for maintainability in both cases, not because of any supposed portability differences.
 +
* In either case, I would never recommend Schematron-specific markup inside a resource bundle string, so there should be no issue with the binding of the namespace prefix "u".
 +
  
All reviewer feedback should go in the [[Talk:COSMOS_Design_238000|Talk page for 238000]].
 
 
----
 
----
 
[[Category:COSMOS_Bugzilla_Designs]]
 
[[Category:COSMOS_Bugzilla_Designs]]

Latest revision as of 14:59, 8 August 2008

Support locid attribute in SML Validation

Change History

Name: Date: Revised Sections:
David Whiteman 06/24/2008
  • Initial creation
Hubert Leung 07/22/2008
  • Completed design

Workload Estimation

Rough workload estimate in person weeks
Process Sizing Names of people doing the work
Design .5 Hubert Leung
Code 2 Hubert Leung
Test 1 Hubert Leung
Documentation 0
Build and infrastructure 0
Code review, etc. 0
TOTAL 3.5

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document.

Term Definition
SML Service Modeling Language
SML-IF Service Modeling Language - Interchange Format

Purpose

This document is associated with bugzilla 238000.

We need to implement the optional sml:locid attibute in our validator.

The locid attribute is introduced in SML 1.1 to provide the capability to retrieve localized strings for text elements within an SML document. The specification provided an example for using the locid in schematron expressions to provide localized error messages. This enhancement implements the support of locid in schematron expressions for the SML validator.

Requirements

The locid attribute is for providing information necessary to retrieve the localized text. The specification is not technology dependent. Our implementation will use the Java resource bundle to retrieve localized strings.

The enhancement implements the example provided in Appendix F of the SML specification: http://www.w3.org/TR/sml/#LocalizationSample

The implementation will only handle the sml:locid attribute value that is defined in an element with the schematron namespace to provide localized validation error messages. The locid attribute defined in other contexts will not be handled.

Design

Locating the resource bundle

The sml:locid attribute will have two parts: a prefix and a key to a string.
e.g. sml:locid="lang:StudentIDErrorMsg"

There is a namespace URI associated with the prefix, defined in one of the parent elements. The format of the URI and how to use the URI to locate the translated resource is out of the scope of the SML specification. So it is an application specific design decision on how to use the URI to locate the translated resources and retrieve the appropriate value. In this implementation, we require the URI to be formatted in the following structure:

sml:<bundle name>[:<locale>]

  • the first segment "sml" is the scheme of the URI. It is a dummy value to make the URI a well formed absolute URI.
  • <bundle name> is the fully qualified name of a Java resource bundle
  • <locale> is the intended locale of the message. It is an optional field.

Notes:

  • URIs used in namespaces have to be in absolute form. Relative URIs are not allowed.
  • The URI format above complies with the URI syntax defined here: http://www.ietf.org/rfc/rfc2396.txt

Examples:

  • sml:org.eclipse.cosmos.rm.internal.messages.Message
  • sml:org.eclipse.cosmos.rm.internal.messages.Message:fr
  • sml:org.eclipse.cosmos.rm.internal.messages.Message:pt_BR

The string retrieved from resource bundle will replace the text content of the element, if present. The following two schematron rules are equivalent, assuming the string retrieval from resource bundle is successful:

<sch:rule context="u:Students/u:Student">
  <sch:assert test="smlfn:deref(.)[starts-with(u:ID,'99')]"
              sml:locid="lang:StudentIDErrorMsg">
    The specified ID <sch:value-of select="string(u:ID)"/> does not begin with 99.
  </sch:assert>
</sch:rule>
<sch:rule context="u:Students/u:Student">
  <sch:assert test="smlfn:deref(.)[starts-with(u:ID,'99')]"
              sml:locid="lang:StudentIDErrorMsg">
  </sch:assert>
</sch:rule>

String substitution

Section 7.1 and Appendix F of the SML specification discusses the use case of string substitution in localized strings. However, the sml:locid attribute does not provide information on string substitution. The SML specification only suggests ways to do string substitution, but it is not a normative part of the specification.

The example in Appendix F of the specification embeds the schematron "value-of" element in the message to do string substitution. This implementation will follow the example closely.


Algorithm

The enhancement will change ElementSchematronCacheBuilder data builder to replace text elements with a translated version before passing the schematron expression to the XSLT transformer.

  • In the startElement method of ElementSchematronCacheBuilder.java, check for the presence of the sml:locid attribute if the element has the schematron namespace.
  • If the sml:locid attribute is present, attempt to retrieve the value indicated in the locid attribute from a resource bundle.
    • get prefix and message key from the attribute value
    • look up the namespace associated with the prefix (some new data structures are required to do this. SAX parsers do not provide prefix lookup directly.)
    • parse the namespace URI for bundle name and the optional locale value
    • load the resource bundle and retrieve string by message key
  • If the retrieval failed, the sml:locid value will be ignored.
  • If the retrieval is successful, then
    • append the string from resource bundle to the rule fragment, right after the openning element tag of the current element.
    • set a flag to suppress the text element and <sch:value-of> elements from being appended to the rule fragment.
    • unset the flag in the endElement event of the element with the sml:locid attribute defined.
  • Strings in resource bundles need to embed variables for string substitution in the messages in the correct syntax to be consumed by the schematron XSLT transformer.

Open Issues/Questions

  • String substitution is an important part of string localization, but it is not supported by the sml:locid attribute. So SML can only claim partial support to localization.
    • I see the two as orthogonal. Ordinarily localized strings are translated, and substituted text is not translated (it comes from some user, and has a fixed language implicitly). E.g. if I give you a server system name, glyph issues notwithstanding the name should be identical regardless of the application's locale.
  • Since the mechanism for retrieving localized string is not standardized, and we have to use implementation-specific ways to handle string substitution, SML documents with localized strings are not interoperable between different implementations of validators and applications that handle the SML documents.
    • Continuing my tradition of treating them separately,
    • Localization: guilty as charged. Until there is a uniform interface across platforms for localization, I see little opportunity to do better than this. Allowing localization on certain platforms, e.g. Java using resource bundles, is a far better situation for users than no localization at all IMO.
    • Variable substitution: in some contexts, e.g. Schematron rules, there appear to be mechanisms that are consistent across platforms... the Schematron spec requires support for an XSLT-based engine in all implementations. In other contexts, e.g. an SML-IF model's displayName (i.e. in a pure XML context) it is less clear that consistent mechanisms exist, granted. As with localization, the spec authors chose a partial solution over no solution.
  • The specification suggests to use the xsl:variable to do string substitution (Appendix F). However, this mechanism does not work with the schematron XSL translator from http://xml.ascc.net/schematron/1.5/. The translator reports the following error:

Line #0; Column #0; org.apache.xml.utils.WrappedRuntimeException: 

Could not find variable with the name of var Note that it doesn't work even without the sml:locid attribute.

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
            xmlns:lang="http://www.university.example.org/translation/">
    <sch:ns prefix="u" uri="http://www.university.example.org/ns" />
    <sch:ns prefix="smlfn" uri="http://www.w3.org/2008/03/sml-function"/>
    <sch:pattern id="StudentPattern”>
        <sch:rule context="u:Students/u:Student">
            <sch:assert test="smlfn:deref(.)[starts-with(u:ID,'99')]">
                <xsl:variable name="var” select=”u:ID” />
                The specified ID <sch:value-of select="string($var)"/> does not begin with 99.
            </sch:assert>
        </sch:rule>
    </sch:pattern>
</sch:schema>

In my implementation, I expect resource bundles to have the following content:

StudentIDErrorMsg = L'identifieur specifie <sch:value-of select="string(u:ID)"/> ne commence pas par 99.
StudentIDErrorMsg = Das angegebene Attributkennzeichen ID <sch:value-of select="string(u:ID)"/> beginnt nicht mit 99.
  • We need to assess whether the error is a consequence of a flawed implementation, a down-level implementation (1.5 is not the ISO version IIRC), or a limitation of Schematron as currently specified.

In my opinion, hard coding the variable name "var" is not much better than assuming the prefix to be "u". So the recommendation to use an xsl variable for its portability is arguable. Java resource bundles use integers to indicate the indexes of the list of parameters, which is a more portable solution. For example:

StudentIDErrorMsg = L'identifieur specifie {0} ne commence pas par 99.
StudentIDErrorMsg = Das angegebene Attributkennzeichen ID {0} beginnt nicht mit 99.
  • I don't see "var" versus "0" as a meaningful difference. I would recommend named variables because they are named (the names presumably convey semantic meaning) and because they are insensitive to the insertion of additional preceding variables in the string (not true of positional parameters), i.e. for maintainability in both cases, not because of any supposed portability differences.
  • In either case, I would never recommend Schematron-specific markup inside a resource bundle string, so there should be no issue with the binding of the namespace prefix "u".



Back to the top