Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Equinox/p2/Omni Version"

< Equinox‎ | p2
(Named Version Formats)
(Examples of Version Formats)
 
(120 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{CommentBox|W.I.P - Reworked proposal after discussion Dec 15, 2008 - Not Ready for review. DO NOT EDIT - EDITS ARE CURRENTLY MADE OFFLINE
+
{{CommentBox|'''This proposal is implemented and is part of p2 as of 3.5M5'''
Summary of changes:
+
* Previous proposal alllowed multiple implementations of Version and VersionRange. The new proposal  is to only support a single implementation.
+
* The 'any' format has been removed
+
* The 'raw' format has been change to the new name 'format', and 'raw' is now used to describe a canonical version
+
* The pattern language have been made more powerful - since new version types can not be plugged in
+
* A proposal how to handle sharing of named patterns has been added
+
* An FAQ has been added at the end, answering some of the questions asked by reviewers
+
 
}}
 
}}
 
=Introduction=
 
=Introduction=
This page describes a proposal for adding support for non OSGi version and version ranges in Equinox p2. This page was created as a result of the discussion on the p2 call on Dec 1, 2008.  
+
This page describes the "Omni Version" - an implementation of Version and VersionRange classes in Equinox p2 that enables p2 to handle other versioning schemes than OSGi. See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=233699 bug 233699] for discussion.
See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=233699 bug 233699] for discussion.
+
 
==Background==
 
==Background==
 
There are other versioning schemes in wide use that are not compatible with OSGi version and version ranges. The problem is both syntactic and semantic.
 
There are other versioning schemes in wide use that are not compatible with OSGi version and version ranges. The problem is both syntactic and semantic.
Line 49: Line 41:
 
These are not syntactically compatible with OSGi versions.
 
These are not syntactically compatible with OSGi versions.
  
==Current implementation in p2==
+
==Current implementation in p2 3.5M5==
The current implementation in p2 uses the classes Version and VersionRange to describe the two concepts and these are implementations handling only OSGi version type.
+
The current implementation in p2 uses the OSGi resolver to create the final step of a provisioning plan. This means that versions that can not be converted to OSGi will cause the planner to stop with an error. This is expected to be fixed when a SAT4J based planner is used.
  
=Proposed Solution=
+
=Solution=
 
==One implementation of Version and VersionRange==
 
==One implementation of Version and VersionRange==
Equinox p2 should have one implementation of Version and one of VersionRange (called OmniVersion, and OmniVersionRange in this proposal just to give these implementation different names) capable of capturing the semantics of various version formats. The advantages over previous proposal are that there is no need to dynamically plugin new implementations, and new formats can be more easily be introduced.
+
Equinox p2 has one implementation of Version and one of VersionRange (refered to as OmniVersion, and OmniVersionRange to describe that they are capable of capturing the semantics of various version formats). The advantages over previous proposed implementations are that there is no need to dynamically plugin new implementations, and new formats can be more easily be introduced.
  
 
Even if the finished solution only requires a single implementation (the OmniVersion discussed below), there are other factors to consider. The current p2 SimplePlanner uses the OSGi planner, and it can only understand OSGi versions. There is work being done on SAT4J to enable it being used instead of the OSGi planner (work to handle "explanations" could also be used to handle "attachments" (now being done with OSGi planner).  
 
Even if the finished solution only requires a single implementation (the OmniVersion discussed below), there are other factors to consider. The current p2 SimplePlanner uses the OSGi planner, and it can only understand OSGi versions. There is work being done on SAT4J to enable it being used instead of the OSGi planner (work to handle "explanations" could also be used to handle "attachments" (now being done with OSGi planner).  
  
See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=233699 bug 233699] for more information. A patch is now proposed that has the p2 Version and VersionRange classes broken out, and where these are translated to OSGI version and version range where needed.
+
See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=233699 bug 233699] for more information.
 
+
With this patch it is possible to replace the p2 Version and VersionRange with the Omni implementation and replace SimplePlanner's use of OSGi planner with a similar planner that handles OmniVersion. This can be done by someone that needs support for different versions formats before the SAT4j solution is available.
+
  
 
==One Canonical Format==
 
==One Canonical Format==
The OmniVersion and OmniVersion range should be "universal" - all instances of Version should be comparable against each other with a fully defined (non ambiguous) ordering. The API could (as today) be based on a single string fully describing a version or version range, or use a structured approach to describe the canonical, original, and transformation rule in separate strings. In both cases, having string representation in the API is probably the best.  
+
The OmniVersion and OmniVersion range are "universal" - all instances of Version should be comparable against each other with a fully defined (non ambiguous) ordering. The API is (as today) based on a single string fully describing a version or version range.  
  
This proposal is based on the assumption that a single string is used.
+
The canonical string format is called "raw" and it is explained in more detail below. To ensure backwards compatibility, as well as providing ease of use in an osgi environment, version strings that are not prefixed with an OmniVersion keyword (e.g. "raw"have the same format and semantics as the current osgi version format.
 
+
The canonical string format is called "raw" and it is explained in more detail below. To ensure backwards compatibility, as well as providing ease of use in an osgi environment, version strings that are not prefixed with the keyword "raw" have the same format and semantics as the current osgi version format.
+
  
 
Ad an example the following two version strings are both valid input, and express exactly the same version:
 
Ad an example the following two version strings are both valid input, and express exactly the same version:
Line 74: Line 62:
  
 
==Implementation of Omni Version and VersionRange==
 
==Implementation of Omni Version and VersionRange==
An implementation is being developed in parallell with this specification.
 
 
===OmniVersion===
 
===OmniVersion===
The OmniVersion implementation uses an Object Array to store version-segements in order of descending significance. A segment is an instance of Integer, String, Comparable[], MaxInteger, MaxString, or Pad. A Pad instance has a reference to one version segment  used as padding. Pad can only be placed last in a version segment array.  
+
The OmniVersion implementation uses an vektor to store version-segements in order of descending significance. A segment is an instance of Integer, String, Comparable[], MaxInteger, MaxString, or Min.
  
 
====Comparison====
 
====Comparison====
 
Comparison is done by iterating over segments from 0 to n.  
 
Comparison is done by iterating over segments from 0 to n.  
* If a segment is a pad segment, the referenced version segment is used in the comparison.
 
 
* If segments are of different type the rule MaxInteger > Integer > Comparable[] > MaxString > String is used - the comparison is done and the version with the greater segment type is reported as greater.
 
* If segments are of different type the rule MaxInteger > Integer > Comparable[] > MaxString > String is used - the comparison is done and the version with the greater segment type is reported as greater.
 
* If segments are of equal type - they are compared - if one is greater the comparison is done and the version with the greater segment is reported as greater.
 
* If segments are of equal type - they are compared - if one is greater the comparison is done and the version with the greater segment is reported as greater.
* If the shorter of two versions has a pad segment, the extra segments in the longer version are compared against this pad segment.
+
* All versions are by default padded with -M (absolute min segment) "to infinity". A version may have an explicit pad element which is used instead of the default.
* If all segments are equal up to end of the shortest segment array, the longer version is reported as greater.
+
* A shorter version is compared to a longer by comparing the extra segments in the longer version against the shorter version's pad segment.
 +
* If all segments are equal up to end of the longest segment array, the pad segments are compared, and the version with the greater pad segment is reported as greater.
 +
* If pad segments are also equal the two versions are reported as equal.
 
* As a consequence of not including delimiters in the canonical format; two versions are equal if they only differ on delimiters.
 
* As a consequence of not including delimiters in the canonical format; two versions are equal if they only differ on delimiters.
 +
 +
As an example - here is a comparison of versions (expressed in the raw format introduced further on in the text - 'p' means that a pad element follows, and -M the absolute min segment):
 +
  1p-M < 1.0.0 < 1.0.0p0 == 1p0 < 1.1 < 1.1.1 < 1p1 == 1.1p1 < 1pM
  
 
====Raw and Original Version String====
 
====Raw and Original Version String====
Line 92: Line 83:
 
A version string with raw and original is written on the form:
 
A version string with raw and original is written on the form:
  
   'raw' ':' raw-format-string '/' original-format-string
+
   'raw' ':' raw-format-string '/' format(...):original-format-string
  
 
The p2 Engine completely ignores the original part - only the raw part is used, and the original format is only used for human consumption.
 
The p2 Engine completely ignores the original part - only the raw part is used, and the original format is only used for human consumption.
  
 
Example using a mozilla version string (as it has the most complex format encountered to date).
 
Example using a mozilla version string (as it has the most complex format encountered to date).
   raw:{1.m.0.m}.{20.'a'.3.'b'}p{0.m.0.m}/1.20a3b.a
+
   raw:<1.m.0.m>.<20.'a'.3.'b'>p<0.m.0.m>/format((<n=0;?s=m;?n=0;?s=m;?>(.<n=0;?s=m;?n=0;?s=m;?>)*)=p<0.m.0.m>;):1.20a3b.a
 +
 
 +
An original version string can be included with unknown format:
 +
  raw:<1.m.0.m>.<20.'a'.3.'b'>p<0.m.0.m>/:1.20a3b.a
  
 
See below for full explanation of the raw format.
 
See below for full explanation of the raw format.
Line 108: Line 102:
  
 
The raw-range can be followed by the original range:
 
The raw-range can be followed by the original range:
   raw-range '/' ( '[' | '(' ) original-format-string ',' original-format-string ( ']' | ')' )
+
   raw-range '/' 'format' '(' format-string ')' ':' ( '[' | '(' ) original-format-string ',' original-format-string ( ']' | ')' )
 +
 
 +
An original version range can be included with unknown format:
 +
  raw: [<1.m.0.m>.<20.m.0.m>p<0.m.0.m>,<1.m.0.m>.<20.'a'.3.'b'>p<0.m.0.m>]/:[1.20,1.20a3b.a]
  
 
The p2 Engine completely ignores the original part - only the raw part is used, and the original format is only used for human consumption.
 
The p2 Engine completely ignores the original part - only the raw part is used, and the original format is only used for human consumption.
Line 114: Line 111:
 
See below for full explanation of the raw format.
 
See below for full explanation of the raw format.
  
 +
====Other range formats====
 
Note that some version schemes have range concepts where the notion of inclusive or exclusive does not exist, and instead use symbolic markers such as "next larger", "next smaller", or use wildcards to define ranges.
 
Note that some version schemes have range concepts where the notion of inclusive or exclusive does not exist, and instead use symbolic markers such as "next larger", "next smaller", or use wildcards to define ranges.
 
In these cases, the publisher of an IU must use discrete versions and the inclusive/exclusive notation to define the same range.
 
In these cases, the publisher of an IU must use discrete versions and the inclusive/exclusive notation to define the same range.
  
==Version Formats==
+
Some range specifications allows the specification of union, or exclusion of certain versions. This is not yet supported by p2. If introduced it could be expressed as a series of ranges where a ^ before a range negates it.  Example [0,1][3,10]^[3.1,3.7) equivalent to [0,10]^(1,3)^[3.1,3.7)
There are two basic formats ''default osgi string format'', and ''raw canonical string format''.
+
  
The following definitions are common to all the definitions below:
+
==Format Specification==
 +
There are two basic formats ''default osgi string format'', and ''raw canonical string format''. There are also two corresponding range formats osgi-version-range, and raw-version-range.
 +
 
 +
The raw format is a string representation of the internally used format - it consists of the keyword "raw", followed by a list of entries separated by period.
 +
An entry can be numerical, quoted alphanumerical, or a sub canonical list on the same format.
 +
A canonical version (and sub canoncial version arrays) can be padded to infinity with a special padding element.
 +
Special entries express the notion of 'max integer' and 'max string'.
 +
 
 +
The osgi string format is the well known format in current use.
 +
 
 +
'''The raw format in BNF:'''
 +
<pre>
 
   digit: [0-9];
 
   digit: [0-9];
 
   letter: [a-zA-Z];
 
   letter: [a-zA-Z];
Line 128: Line 136:
 
   delimiter: [^0-9a-zA-Z];
 
   delimiter: [^0-9a-zA-Z];
 
   character: .;
 
   character: .;
   characters .+
+
   characters .+;
   quoted-string: ("[^"]*")|('[^']*')  // i.e a sequence of charactes quoted with " or ', where ' can be used in a " quoted string and vice versa
+
   quoted-string: ("[^"]*")|('[^']*'); // i.e a sequence of charactes quoted with " or ', where ' can be used in a " quoted string and vice versa
 +
  range-safe-string:  TBD; // a sequence of any characters but with ',' ']', ')' and '\' escaped with '\';
 
   sq: ['];
 
   sq: ['];
 
   dq: ["];
 
   dq: ["];
  
{| {{Greytable}}
+
  version :
|-valign="top" style="background-color:#eeeeee;color:#444444;"
+
      | osgi-version
| '''format string'''
+
      | raw-version
| '''description'''
+
      ;
|-valign="top"
+
  osgi-version :
{{Command|'' ''}}
+
      | numeric
| A version string that starts with a digit is an OSGi version string. It is backwards compatible. The input string "1.0.0r1234" is an example of an osgi version string.
+
      | numeric '.' numeric
|-valign="top"
+
      | numeric '.' numeric '.' numeric
{{Command|raw}}
+
      | numeric '.' numeric '.' numeric '.' .+
| The raw format is a string representation of the internally used format - it consists of the keyword "raw", followed by a list of entries separated by period.
+
      ;
An entry can be numerical, quoted alphanumerical, or a sub canonical list on the same format.
+
  raw-version :
A canonical version (and sub canoncial version arrays) can be padded to infinity with a special padding element.
+
      | 'raw' ':' raw-segments optional-original-version
Special entries express the notion of 'max integer' and 'max string'.
+
      ;
 +
  optional-original-version :
 +
      |
 +
      | '/' original-version
 +
      ;
 +
  version-range :
 +
      | osgi-version-range
 +
      | raw-version-range
 +
      ;
 +
  rs : ('[' | '(') ;
 +
  re : (']' | ')') ;
  
The raw canonical form has the following format:
+
  osgi-version-range :  
<pre>
+
       | rs osgi-version ',' osgi-version re
    raw-format :
+
       | 'raw' ':' raw-elements [ pad-element ]
+
 
       ;
 
       ;
    raw-elements :
+
  raw-version-range :
 +
      | 'raw' ':' rs raw-segments ',' raw-segments re optional-original-range
 +
      ;
 +
  optional-original-range :
 +
      |
 +
      | '/' original-range
 +
      ;
 +
 
 +
  raw-segments :
 +
      | raw-elements optional-pad-element
 +
      ;
 +
  raw-elements :
 
       | raw-elements '.' raw-element
 
       | raw-elements '.' raw-element
 
       | raw-element
 
       | raw-element
Line 159: Line 187:
 
       | numeric
 
       | numeric
 
       | quoted-strings  // strings are concatenated
 
       | quoted-strings  // strings are concatenated
       | '{' raw-elements [ pad-element ] '}'  // subvector of elements
+
       | '<' raw-elements optional-pad-element '>'  // subvector of elements
       | 'm'  // symbolic 'maxs' == max string
+
       | 'm'  // symbolic 'maxs' == max string  
       | 'M'  // symbolic 'max' i.e. max > maxs
+
       | 'M'  // symbolic 'absolute max' i.e. max > MAX_INT > maxs
 +
      | '-M // symbolic 'absolute min' i.e. -M <  empty string < array <  int
 +
      ;
 +
  optional-pad-element :
 +
      |
 +
      | pad-element
 
       ;
 
       ;
 
   quoted-strings :
 
   quoted-strings :
    | quoted-strings quoted-string
+
      | quoted-strings quoted-string
    | quoted-string
+
      | quoted-string
    ;
+
      ;
 
   pad-element :
 
   pad-element :
    | 'p' raw-element
+
      | 'p' raw-element
    ;
+
      ;
 +
 
 +
  original-version :
 +
      | optional-format-definition ':' .*
 +
      ;
 +
  original-range :
 +
      | optional-format-definition ':' rs range-safe-string ',' range-safe-string re
 +
      ;
 +
  optional-format-definition :
 +
      |
 +
      | format-definition
 +
      ;
 +
  format-definition :
 +
      | 'format' '(' pattern ')'
 +
      ;
 +
 
 +
  // Definition of parsing patterns
 +
  //
 +
  pattern :
 +
      | pattern pattern-element
 +
      | pattern-element
 +
      ;
 +
  pattern-element :
 +
      | pelem optional-processing-rules optional-pattern-range
 +
      | '[' pattern ']' processing-rules
 +
      ;
 +
  optional-processing-rules :
 +
      | optional- processing-rules '=' processing-rule ';'
 +
      | '=' processing-rule ';'
 +
      |
 +
      ;
 +
  optional-pattern-range :
 +
      | repeat-range
 +
      |
 +
      ;
 +
 
 +
  pelem
 +
      | 'r' | 'd' | 'p' | 'a' | 's' | 'S' |  'n' | 'N' | 'q'
 +
      | '(' pattern ')'
 +
      | '<' pattern '>'
 +
      | delimiter
 +
      ;
 +
  repeat-range :
 +
      | '?' | '*' | '+'
 +
      | '{' exact '}'
 +
      | '{' at-least ',' '}'
 +
      | '{' at-least ',' at-most '}'
 +
      ;
 +
 
 +
  exact : at-least : at-most : numeric ;
 +
 
 +
  processing-rule :
 +
      | raw-element
 +
      | pad-element
 +
      | '!'
 +
      | '[' char-list ']'
 +
      | '[' '^' char-list ']'
 +
      | '{' exact '}'  // for character count
 +
      | '{' at-least ',' '}'
 +
      | '{' at-least ',' at-most '}'
 +
      ;
 +
  char-list: TBD ; // Sequence of any character but with '^', ']' and '\' escaped with '\'
 +
  delimiter :
 +
      | [!#$%&/=^,.;:-_ ] // Any non-alpha-num that has no special meaning
 +
      | quoted-string
 +
      | '\' .  // any escaped character
 +
      ;
 +
 
 
</pre>
 
</pre>
 +
  
 
Examples:
 
Examples:
 
* OSGi 1.0.0.r1234 is expressed as raw:1.0.0.'r1234'
 
* OSGi 1.0.0.r1234 is expressed as raw:1.0.0.'r1234'
 
* apache/triplet style 1.2.3 is expressed as raw:1.2.3.m
 
* apache/triplet style 1.2.3 is expressed as raw:1.2.3.m
* mozilla style 1a.2a3c. can be expressed as raw:{1.a.0.m}.{2.a.3.c}p{0.m.0.m} (mozilla is a complex format - see external links at the end of page).
+
* mozilla style 1a.2a3c. can be expressed as raw:<1.'a'.0.m>.<2.'a'.3.'c'>p<0.m.0.m> (mozilla is a complex format - see external links at the end of page).
|}
+
 
 +
==Format Pattern Explanation==
 +
Here are explanations for the rules in format(pattern).
  
==Other formats==
 
TBD
 
 
{| {{Greytable}}
 
{| {{Greytable}}
 
|-valign="top" style="background-color:#eeeeee;color:#444444;"
 
|-valign="top" style="background-color:#eeeeee;color:#444444;"
| '''format string'''
+
| '''rule'''
 
| '''description'''
 
| '''description'''
 
|-valign="top"
 
|-valign="top"
{{Command|format(<transformation-pattern>)}}
+
{{Command|r}}
| Specifies a version format consisting of a ''transformation pattern''.
+
| raw - matches one ''raw-element'' as specified by the <tt>raw</tt> format. The 'r' rule does not match a pad element - use 'p' for this.
The ''transformation pattern'' can contain the following rules:
+
|-valign="top"
* <tt>r</tt> - raw - matches one ''raw-element'' as documented for the <tt>raw</tt> format. If the matching 'r' element is the last element in the input string it may also be a ''pad-element''.
+
{{Command|<nowiki>'</nowiki>''characters''<nowiki>'</nowiki>}}
* <tt>'</tt>''character(s)''<tt>'</tt> - matches a single character or sequence of characters - the matched result is not included in the resulting canonical vector (i.e. it is not a segment). A '\\' is needed to include a single '\'. The sequence of chars acts as one delimiter.
+
| matches a single character or sequence of characters - the matched result is not included in the resulting canonical vector (i.e. it is not a segment). A '\\' is needed to include a single '\'. The sequence of chars acts as one delimiter.
* ''non-alphanum character'' - matches any non alpha-numerical character (including space) -  the matched result is not included in the canonical vector (i.e. it is not a segment). A non alphanumerical character acts as a delimiter. Special characters must be escaped when wanted as delimiters.
+
|-valign="top"
* <tt>a</tt> - auto - a sequence of digits creates a numeric segment, a sequence of alphabetical characters creates a string segment. Segments are delimited by any character not having the same character class as the first character in the sequence, or by the following delimiter. A numerical sequence ignores leading zeros. If a string segment starts with ' or " the string is treated as a quoted string, and the segment is delimited by the same character (the enclosing quotes are not part of the resulting string).
+
{{Command|''non-alphanum character''}}
* <tt>d</tt> - delimiter; matches any non alpha-numeric character.  
+
|matches any non alpha-numerical character (including space) -  the matched result is not included in the canonical vector (i.e. it is not a segment). A non alphanumerical character acts as a delimiter. Special characters must be escaped when wanted as delimiters.
* <tt>s</tt> - a string group matching any character except any following explicit/optional delimiter
+
|-valign="top"
* <tt>n</tt> - a numeric (integer) group. Leading zeros are ignored.
+
{{Command|a}}
* <tt>p</tt> - parses an explicit ''pad-element'' in the input string as defined by the raw format. To define an implicit pad as part of the pattern use the processing instruction <tt>=pad()</tt>. A pad element can only be last in the overall version string, or last in a sub array.
+
| auto - a sequence of digits creates a numeric segment, a sequence of alphabetical characters creates a string segment. Segments are delimited by any character not having the same character class as the first character in the sequence, or by the following delimiter. A numerical sequence ignores leading zeros.  
* <tt>q</tt> - smart quoted string - if the first character of the string segment is a non alphanumeric character, the string is delimited by the same character. Brackets and parenthesises (i.e. (), {}, [], <>) are handled as pairs, thus 'q' matches "<andrea-doria>" and produces a single string segment with the text 'andrea-doria'.
+
|-valign="top"
* <tt>( )</tt> - indicates a group
+
{{Command|d}}
* <tt>< ></tt> - indicates a group, where the resulting elements of the group is placed in an array, and the array is one resulting element in the enclosing result
+
| delimiter; matches any non alpha-numeric character. The matched result is not included in the resulting canonical vector (i.e. it is not a segment).
* <tt>?</tt> - zero to one occurrence of the preceding rule
+
|-valign="top"
* <tt>*</tt> - zero to many occurrences of the preceding rule
+
{{Command|s}}
* <tt>+</tt> - one to many occurrences of the preceding rule
+
|a string group matching only alpha characters (i.e. "letters"). Use processing rules  =[]; or =[^] to define the set of allowed characters. It is possible to allow inclusion of delimiter chars, but not inclusion of digits.
* <tt>{n}</tt> - exactly n occurrences of the preceding rule
+
|-valign="top"
* <tt>{n,}</tt> - at least n occurrences of the preceding rule
+
{{Command|S}}
* <tt>{n,m}</tt> -  at least n occurrences of the preceding rule, but not more than m times
+
|a string group matching any group of characters. Use processing rules  =[]; or =[^] to define the set of allowed characters. Care must be taken to specify exclusion of a delimiter if elements are to follow the 'S'.
* <tt>{%n}</tt> - exactly n characters matching the preceding 's', 'n', 'q', or 'a' rule rule. For 'q' and quoted 'a', the quotes does not count.
+
|-valign="top"
* <tt>{%n,}</tt> - at least n characters matching the preceding 's', 'n', 'q', or 'a' rule rule. For 'q' and quoted 'a', the quotes does not count.
+
{{Command|n}}
* <tt>{%n,m}</tt> -  at least n characters matching the preceding 's', 'n', 'q', or 'a' rule , but not more than m characters. For 'q' and quoted 'a', the quotes does not count.
+
| a numeric (integer) group with value >= 0. Leading zeros are ignored.
* <tt>[ ]</tt> - short hand notation for an optional group. Is equivalent to ()?
+
|-valign="top"
* <tt>=</tt>''processing''<tt>;</tt> - an additional processing rule is applied to the preceding rule. The ''processing'' part can be:
+
{{Command|N}}
** ''raw-element'' - use this ''raw-element'' (as defined by the raw format) as the default value if input is missing
+
| a possibly negative value numeric (integer) group. Leading zeros are ignored.
** <tt>ignore</tt> - if input is present do not turn it into a segment (i.e. ignore what was matched)
+
|-valign="top"
** <tt>[<list of chars>]</tt> - when applied to a 'd' defines the set of delimiters. The characters ], ^, and \ must be escaped with \ to be used in the list of chars. and Example d=[+-/];
+
{{Command|p}}
** <tt>[^<list of chars>]</tt> - when applied to a 'd' defines the set of delimiters to be all non alpha numeric except the listed characters. The characters ], ^, and \ must be escaped with \ to be used in the list of chars. Example d=[^$]
+
| parses an explicit ''pad-element'' in the input string as defined by the raw format. To define an implicit pad as part of the pattern use the processing instruction <tt>=p...;</tt>. A pad element can only be last in the overall version string, or last in a sub array.
** <tt>pad(</tt>''raw-element''<tt>)</tt> - defines "padding to infinity with specified raw-element" when applied to an array, or a group enclosing the entire format. Example <tt>format((n.s)=pad(max);)</tt>
+
|-valign="top"
* <tt>\</tt> - escape removes the special meaning of a character and must be used if a special character is wanted as a delimiter. A '\\' is needed to include a '\'. Escaping a non special character is superflous but allowed.
+
{{Command|q}}
 +
| smart quoted string - matches a quoted alphanumeric string where the quote is determined by the first character of the string segment. The quote must be a non alphanumeric character, and the string must be delimited by the same character except brackets and parenthesises (i.e. (), {}, [], <>) which are handled as pairs, thus 'q' matches "<andrea-doria>" and produces a single string segment with the text 'andrea-doria'. A non-quoted sequence of characters are not matched by 'q'.
 +
|-valign="top"
 +
{{Command|()}}
 +
| indicates a group
 +
|-valign="top"
 +
{{Command|< >}}
 +
| indicates a group, where the resulting elements of the group is placed in an array, and the array is one resulting element in the enclosing result
 +
|-valign="top"
 +
{{Command|?}}
 +
| zero to one occurrence of the preceding rule
 +
|-valign="top"
 +
{{Command|*}}
 +
| zero to many occurrences of the preceding rule
 +
|-valign="top"
 +
{{Command|+}}
 +
| one to many occurrences of the preceding rule
 +
|-valign="top"
 +
{{Command|{n}}}
 +
| exactly n occurrences of the preceding rule
 +
|-valign="top"
 +
{{Command|{n,}}}
 +
| at least n occurrences of the preceding rule
 +
|-valign="top"
 +
{{Command|{n,m}}}
 +
| at least n occurrences of the preceding rule, but not more than m times
 +
|-valign="top"
 +
{{Command|<nowiki>[ ]</nowiki>}}
 +
| short hand notation for an optional group. Is equivalent to ()?
 +
|-valign="top"
 +
{{Command|<nowiki>=</nowiki>''processing'';}}
 +
| an additional processing rule is applied to the preceding rule. The ''processing'' part can be:
 +
* ''raw-element'' - use this ''raw-element'' (as defined by the raw format) as the default value if input is missing. The default value does not have to be of the same type (e.g. "s=123;?" produces an integer segment of value 123 if the optional s is not matched.
 +
* <tt>!</tt> - if input is present do not turn it into a segment (i.e. ignore what was matched)
 +
* <tt>[<list of chars>]</tt> - when applied to a 'd' defines the set of delimiters. The characters ], ^, and \ must be escaped with \ to be used in the list of chars. and Example d=[+-/]; One or several ranges of characters such as "a-z" can also be used. Example d=[a-zA-Z0-9_-];
 +
* <tt>[^<list of chars>]</tt> - when applied to a 'd' defines the set of delimiters to be all non alpha numeric except the listed characters. The characters ], ^, and \ must be escaped with \ to be used in the list of chars. One or several ranges of characters such as "a-z" can also be used. Example d=[^$]
 +
* <tt>p</tt>''raw-element''<tt></tt> - defines "padding to infinity with specified raw-element" when applied to an array, or a group enclosing the entire format. Example <tt>format((n.s)=pM;)</tt> The pad processing rule is only applied to a parsed array, not to a default value for an array. If padding is wanted in the default array value, it can be expressed explicitly in the default value.
 +
* <tt>{n} {n,} {n,m}</tt> character ranges - with the same meaning as the rules with the same syntax, but limits the range in characters matched in the preceding 's', 'S', 'n', 'N', 'q', or 'a' rules. For 'q' the quotes does not count.
 +
|-valign="top"
 +
{{Command|<nowiki>\</nowiki>}}
 +
| escape removes the special meaning of a character and must be used if a special character is wanted as a delimiter. A '\\' is needed to include a '\'. Escaping a non special character is superflous but allowed.
 +
|}
  
 
Additional rules:
 
Additional rules:
* if the rule produces null segments, they are not placed in the result vector e.g. format(ndddn):10-/-12 => raw:10.12
+
* if a rule produces a null segment, it is not placed in the result vector e.g. format(ndddn):10-/-12 => raw:10.12
 
* Processing (i.e. default values) applied to a group has higher precedence than individual processing inside the group if the entire group was not successfully matched.
 
* Processing (i.e. default values) applied to a group has higher precedence than individual processing inside the group if the entire group was not successfully matched.
 
* Parsing is greedy - format(n(.n)*(.s)*) will interpret 1.2.3.hello as raw:1.2.3.'hello' (as opposed to being reluctant which would produce raw:1.'2'.'3'.'hello')
 
* Parsing is greedy - format(n(.n)*(.s)*) will interpret 1.2.3.hello as raw:1.2.3.'hello' (as opposed to being reluctant which would produce raw:1.'2'.'3'.'hello')
|}
+
* When combining N with ={...}; and the input has a negative number, the "-" is included in the character count - "format(N{3}N{2}):-1234" results in "raw:-123.4"
 +
* When combining n or N with ={...} and input has leading zeros - these are included in the character count.
 +
* An empty version strings is always considered to be an error.
 +
* A format that produces no segments is always considered to be an error.
 +
 
 +
Note about white space in the raw format:
 +
* white space is accepted inside quoted strings - i.e. "1.'a string'" is allowed, but not "1.  2"
 +
* white space is accepted between version range delimiters and version strings - i.e. [ 1.0, 2.0 ] is allowed.
 +
 
 
'''Note about timestamps'''
 
'''Note about timestamps'''
 
An earlier proposal had a 't' rule, but this rule has been deprecated because of the complexity. Instead, the creator of an IU should simply use 's' or 'n' and ensure comparability by using a fixed number of characters when choosing 's' format.
 
An earlier proposal had a 't' rule, but this rule has been deprecated because of the complexity. Instead, the creator of an IU should simply use 's' or 'n' and ensure comparability by using a fixed number of characters when choosing 's' format.
  
===Named Version Formats===
+
===Examples of Version Formats===
Named version formats makes it easier to enter version strings. There should be a number of predefined names as shown in the table below.
+
Here are examples of various version formats expressed as using the format pattern notation. The examples also show a proposed notation of using aliases for formats. (See the section 'Tooling Support')
 
{| {{Greytable}}
 
{| {{Greytable}}
 
|-valign="top" style="background-color:#eeeeee;color:#444444;"
 
|-valign="top" style="background-color:#eeeeee;color:#444444;"
 
| '''type name'''
 
| '''type name'''
| style="width:200px;"| '''pattern'''  
+
| style="width:230px;"| '''pattern'''  
 
| '''comment'''
 
| '''comment'''
 
|- valign="top"
 
|- valign="top"
 
{{Command|osgi}}
 
{{Command|osgi}}
| n[.n=0;[.n=0;[.s]]]
+
| n[.n=0;[.n=0;[.S=[a-zA-Z0-9_-];]]]
 
|  Example: the following are equivalent:
 
|  Example: the following are equivalent:
* format(n[.n=0;[.n=0;[.s]]]):1.0.0.r1234
+
* format(n[.n=0;[.n=0;[.S=[a-zA-Z0-9_-];]]]):1.0.0.r1234
 
* raw:1.0.0.'r1234'
 
* raw:1.0.0.'r1234'
 
* osgi:1.0.0.r1234
 
* osgi:1.0.0.r1234
 
* 1.0.0.r1234
 
* 1.0.0.r1234
 +
 
|-valign="top"
 
|-valign="top"
 
{{Command|triplet}}
 
{{Command|triplet}}
| n[.n=0;[.n=0;[.s=M;]]]
+
| n[.n=0;[.n=0;]][d?S=M;]
| A variation on OSGi, with the same syntax, but where the a lack of qualifier > any qualifier. The following are all equivalent:
+
| A variation on OSGi, with the same syntax, but where the a lack of qualifier > any qualifier, and the qualifier may contain any character. The following are all equivalent:
* format(n[.n=0;[.n=0;[.s=M;]]]):1.0.0
+
* format(n[.n=0;[.n=0;]][d?S=M;]):1.0.0
 
* raw:1.0.0.M
 
* raw:1.0.0.M
 
* triplet:1.0.0
 
* triplet:1.0.0
 +
|-valign="top"
 +
{{Command|jsr277}}
 +
| n(.n=0;){0,3}[-S=m;]
 +
| As defined by JSR 277 - but is provisional and subject to change as it is expected that compatibility with OSGi will be solved (they are now incompatible because of the fourth numeric field with default value 0). The jsr277 format is similar to triplet, but with 4 numeric segments and a '-' separating the qualifier to allow input of "1-qualifier" to mean "1.0.0.0-qualifier". As in triplet the a lack of qualifier > any qualifier. The following are all equivalent:
 +
* format(n(.n=0;){1,3}[-S=m;]):1.0.0
 +
* raw:1.0.0.0.M
 +
* jsr277:1.0.0
 
|-valign="top"
 
|-valign="top"
 
{{Command|tripletSnapshot}}
 
{{Command|tripletSnapshot}}
| n[.n=0;[.n=0;[-n=M;.s=m;]]]  
+
| n[.n=0;[.n=0;[-n=M;.S=m;]]]  
 
| Format used when maven transforms versions like 1.2.3-SNAPSHOT into 1.2.3-<buildnumber>.<timestamp> ensuring that it is compatible with triplet format if missing <buildnumber>.<timestamp> at the end (format produces max, max-string if they are missing).
 
| Format used when maven transforms versions like 1.2.3-SNAPSHOT into 1.2.3-<buildnumber>.<timestamp> ensuring that it is compatible with triplet format if missing <buildnumber>.<timestamp> at the end (format produces max, max-string if they are missing).
 
Example: the following are equivalent:
 
Example: the following are equivalent:
* format(n[.n=0;[.n=0;[-n=M;.s=m;]]]):1.2.3-45.20081213:1233
+
* format(n[.n=0;[.n=0;[-n=M;.S=m;]]]):1.2.3-45.20081213:1233
 
* raw:1.2.3.45.'20081213:1233'
 
* raw:1.2.3.45.'20081213:1233'
 
* tripletSnapshot:1.2.3-45.20081213:1233
 
* tripletSnapshot:1.2.3-45.20081213:1233
 
|-valign="top"
 
|-valign="top"
 
{{Command|rpm}}
 
{{Command|rpm}}
| [n:]a(d?a)*[-n[ds=ignore;]]
+
| <[n:]a(d?a)*>[-n[dS=!;]]
| RPM format matches [EPOCH:]VERSION-STRING[-PACKAGE-VERSION], where epoch is optional and numeric, version-string is auto matched to arbitrary depth >= 1, followed by a package-version, which consists of a buildnumber separated by any separator from trailing platform specification, or the string 'src' to indicate that the package is a souce package. This format allows the platform and src part to be included in the version string, but if present it is not used in the comparisons. The platform type vs source is expected to be encoded elsewhere in such an IU.
+
| RPM format matches [EPOCH:]VERSION-STRING[-PACKAGE-VERSION], where epoch is optional and numeric, version-string is auto matched to arbitrary depth >= 1, followed by a package-version, which consists of a buildnumber separated by any separator from trailing platform specification, or the string 'src' to indicate that the package is a souce package. This format allows the platform and src part to be included in the version string, but if present it is not used in the comparisons. The platform type vs source is expected to be encoded elsewhere in such an IU. Everything except the build-number is placed in an array as build number is only compared if there is a tie.
  
 
An example of equivalent expressions:
 
An example of equivalent expressions:
* format([n:]a(d?a)*[-n[ds=ignore;]]):33:1.2.3a-23/i386
+
* format(<[n:]a(d?a)*>[-n[dS=!;]]):33:1.2.3a-23/i386
* raw:33.1.1.3.'a'.23
+
* raw:<33.1.2.3.'a'>.23
 
|-valign="top"
 
|-valign="top"
 
{{Command|mozilla}}
 
{{Command|mozilla}}
| (<n=0;?s=m;?n=0;?s=m;?>(.<n=0;?s=m;?n=0;?s=m;?>)*)=pad(<n=0;?s=m;?n=0;?s=m;?>)
+
| (<n=0;?s=m;?n=0;?s=m;?>(.<n=0;?s=m;?n=0;?s=m;?>)*)=p<0.m.0.m>;
| Mozilla versions are somewhat complicated, it consists of 1 or more parts separated by period. Each part consists of 4 optional 'fragments' (numeric, string, numeric,string), where numeric fragments are 0 if missing, and string fragments are MAX-STRING if missing. The versions use padding when compared with longer versions so that 1 == 1.0 == 1.0.0 == 1.0.0.0 etc.
+
| Mozilla versions are somewhat complicated, it consists of 1 or more parts separated by period. Each part consists of 4 optional 'fragments' (numeric, string, numeric,string), where numeric fragments are 0 if missing, and string fragments are MAX-STRING if missing. The versions use padding so that 1 == 1.0 == 1.0.0 == 1.0.0.0 etc.
 
|-valign="top"
 
|-valign="top"
 
{{Command|string}}
 
{{Command|string}}
| s
+
| S
 
| Perhaps superflous, but makes this version format appear in a selectable list of formats.  
 
| Perhaps superflous, but makes this version format appear in a selectable list of formats.  
 
|-valign="top"
 
|-valign="top"
Line 277: Line 435:
 
|}
 
|}
  
The ''version range delimiters'' are: '(', ')', '[',  ']' and , ',' (comma).
+
==Tooling Support==
 +
The OmniVersion is not designed to be extended. Earlier we proposed that it should be possible to define named aliases for common formats and that these formats should be parse-able by the OmniVersion parser. The reasons for introducing alias was to make it possible for users to enter something like "triplet:1.0.0" instead of entering the more complicated format. This did however raise a lot of questions: Who can define an alias, what if the definition of the alias is changed, where are the alias definitions found. Is it possible to work at all with a version that is using only an alias - what if I want to modify a range and do not have access to the alias?
  
==Defining named formats==
+
We instead propose that alias handling is a tooling concern. Tooling should keep a registry of known formats. When a version is to be presented, the format string is "reverse looked up" in the registry - and the alias name can be presented instead of the actual format. This way, the version is always self describing.
An IU can define new named formats. The named formats are defined by using a list of defined format names, and then one property per format.
+
There is still the need to get "well known formats" and make them available in order to make it easier to use non OSGi versions in publishing tools - but there is no absolute requirement to support this in all publishing tools (some may even operate in a domain where version format is implied by the domain) - and there is no "breakage" because an alias is missing.
  
org.equinox.p2.version.formats=<format-name>, <format-name>, ...
+
Tooling support can be as simple as just having preferences where formats are associated with names - the user can enter new formats and aliases. Some import mechanism is probably also nice to have. Further ideas could be that aliases can be published as IU's and installed (i.e install a preference).
org.equinox.p2.version.format.<format-name>=formatstring
+
 
+
Once the format has been specified, it may be used in the IU. The format name should use java package name semantics to ensure unintentional clashes. When using the format names, the user may specify the last part of the name if it is unique. The predefined named formats should not be included in the formats property.
+
 
+
Thus, if an IU introduces the two named formats "org.mycorp.docver", and "org.mycorp.dbdataver" they are described like this:
+
org.equinox.p2.version.formats=org.mycorp.docver, org.mycorp.dbdataver
+
org.equinox.p2.version.format.org.mycorp.docver=n.n[s=max;]
+
org.equinox.p2.version.format.org.mycorp.dbdataver=n.'R'n[.s=ignore;]
+
 
+
When an IU is stored in a repository, the following processing is done:
+
* The defined formats are extracted from the IU
+
* If the fomat name does not already exist in the repository, it is added to the repositories list of contained formats.
+
* If the format name already exists in the contained formats list and the format pattern is the same - nothing needs to be done
+
* If the format name already exists in the contained formats list and the format pattern for the contained name is different - an exception is thrown
+
 
+
When using a non standard format name in an IU:
+
* The format definition must also be stored in the IU.
+
 
+
Attempting to redefine pre-defined formats:
+
* The pre-defined formats have higher precedence - should throw an exception
+
 
+
The user interface can:
+
* collect all defined formats from all known repositories and present them when the user is defining a version or range
+
* have a function to define a new format which is stored in the current profile (and thus becomes available for use)
+
 
+
This scheme allows format names to spread virally. The possible downside is potential clashes between repositories (same format name with different definitions in two different repositories) - but this is not a unique problem for version format. A particular IU of a particular version, or a particular artifact of a particular version, could very well be different in two repositories. As an aid/indicator, the UI can flag conflicting formats to the user.
+
 
+
A repository management tool could have a feature to enable modifying/replacing version fomats thus allowing repair.
+
  
 +
Existing Tooling should naturally use the new OmniVersion implementation to parse strings - thus enabling a user to enter a version in raw or format() form. An implementation can choose to present the full version string (i.e. OmniVersion.toString()), or only the original version.
  
 
==More examples using 'format'==
 
==More examples using 'format'==
 
A version range with format equivalent to OSGi
 
A version range with format equivalent to OSGi
  format(n[.n=0;[.n=0;[.s]]]):[1.0.0.r12345, 2.0.0]
+
  format(n[.n=0;[.n=0;[.S=[a-zA-Z0-9_-];]]]):[1.0.0.r12345, 2.0.0]
  
 
At least one string, and max 5 strings
 
At least one string, and max 5 strings
  format(s[.s[.s[.s[.s]]]]):vivaldi.opus.spring.bar5
+
  format(S=[^.][.S=[^.];[.S=[^.][.S=[^.][.S=[^.]]]]]):vivaldi.opus.spring.bar5
  format(s(.s){0,4}):vivaldi.opus.spring.bar5  => 'vivaldi', 'opus', 'spring', 'bar5'
+
  format(S=[^.](.S=[^.]){0,4}):vivaldi.opus.spring.bar5  => 'vivaldi'.'opus'.'spring'.'bar5'
  
 
At least one alpha or numerical with auto format and delimiter
 
At least one alpha or numerical with auto format and delimiter
  format(a(d?a)*):vivaldi:opus23-spring.bar5  => 'vivaldi', 'opus', 23, 'spring', 'bar', 5
+
  format(a(d?a)*):vivaldi:opus23-spring.bar5  => 'vivaldi'.'opus'.23.'spring'.'bar'.5
  
 
The texts 'opus' and 'bar' should not be included:
 
The texts 'opus' and 'bar' should not be included:
  format(s['.opus'=ignore;n['.bar'=ignore;n]]):vivaldi.opus23.bar8  => 'vivaldi', 23, 8
+
  format(s[.'opus'n[.'bar'n]]):vivaldi.opus23.bar8  => 'vivaldi'.23.8
 +
 
 +
The first string segment should be ignored - it is a marketing name:
 +
format(s=!;.n(.n)*):vivaldi.1.5.3
  
 
Classic SCCS/RCS style:
 
Classic SCCS/RCS style:
Line 334: Line 469:
  
 
Numeric to optional depth 8, where missing input is set to 0, followed by optional string where 'emtpy > any'
 
Numeric to optional depth 8, where missing input is set to 0, followed by optional string where 'emtpy > any'
  format(n(d?n=0;){0,7}[a=M;]):1.1.1.4:beta  => 1,1,1,4,0,0,0,0,'beta'
+
  format(n(d?n=0;){0,7}[a=M;]):1.1.1.4:beta  => 1.1.1.4.0.0.0.0.'beta'
  format(n(d?n=0;){0,7}[a=M;]):1.1.1.4  => 1,1,1,4,0,0,0,0,MAX
+
  format(n(d?n=0;){0,7}[a=M;]):1.1.1.4  => 1.1.1.4.0.0.0.0.M
  
Uninterpreted single string range
+
Single string range
  format(s):[andrea doria,titanic]
+
  format(S):[andrea doria,titanic]
  
 
==Range examples==
 
==Range examples==
[[CommentBox| Needs to be edited}}
 
 
Examples:
 
Examples:
 
* raw:[1.2.3.'r1234',2.0.0]
 
* raw:[1.2.3.'r1234',2.0.0]
Line 347: Line 481:
 
* format(a+):[monkey.fred.ate.5.bananas,monkey.fred.ate.10.oranges]
 
* format(a+):[monkey.fred.ate.5.bananas,monkey.fred.ate.10.oranges]
 
* [1.0.0,2.0.0] equal to osgi:[1.0.0,2.0.0]
 
* [1.0.0,2.0.0] equal to osgi:[1.0.0,2.0.0]
* format(s):[andrea doria,titanic]
+
* format(S):[andrea doria,titanic]
 
* rpm:[7:4.0.3-3.fc9,8:1] - an example KDE Admin version 7:4.0.3-3.fc9 to 8:1
 
* rpm:[7:4.0.3-3.fc9,8:1] - an example KDE Admin version 7:4.0.3-3.fc9 to 8:1
 
* triplet:[1.0.0.RC1,1.0.0]
 
* triplet:[1.0.0.RC1,1.0.0]
  
 
==Internationalization==
 
==Internationalization==
The proposed types using alphanumerical  segments are assumed to use vanilla string comparison. This does not work so well if versions are expressed in a language where lexical ordering is different. Language specific collation could be supported by combining version type name with the name of a ISO 639 Language code (see java.util.Locale) and where the default would be English. The language could be encoded with a separating '-' e.g. 'format-pt' for collation in Portuguese.
+
Alphanumerical segments use vanilla string comparison as internationalization (lexical ordering/collation) would produce different results for different users.
 
+
This opens up another can of worms (decomposition strength, comparison of locale and non locale specified types, etc.), and it is probably best to implement just basic string comparison. It is also questionable if internationalization is wanted at all, as "known tools" does not support this, and "correct collation" would thus yield a different result.
+
 
+
Support for internationalized collation is not recommended.
+
 
+
 
+
 
+
==Factory API==
+
The factory API could be something simple like this:
+
<pre>
+
public class VersionFactory
+
{
+
    IVersion createVersion(String versionString);
+
    IVersion createVersion(String versionType, String versionString);
+
}
+
public class VersionRangeFactory
+
{
+
    IVersionRange createVersionRange(String versionRangeString);
+
    IVersionRange createVersionRange(String versionType, String versionString);
+
}
+
</pre>
+
Hard to say how much indirection is required - methods could just be static to keep things simple.
+
 
+
If we want to support the pattern based type, the factory methods needs the pattern as well. To make this generic, it could be seen as a paramter to the version type.
+
 
+
<pre>
+
public class VersionFactory
+
{
+
    IVersion createVersion(String versionString);
+
    IVersion createVersion(String versionType, String versionString);
+
    IVersion createVersion(String versionType, String versionTypeParameter, String versionString);
+
}
+
public class VersionRangeFactory
+
{
+
    IVersionRange createVersionRange(String versionRangeString);
+
    IVersionRange createVersionRange(String versionType, String versionString);
+
    IVersionRange createVersionRange(String versionType, String versionTypeParameter, String versionString);
+
}
+
</pre>
+
 
+
When creating a pattern based version, the versionTypeParameter must be supplied. When creating a pattern based version range, the pattern is optional - the pattern of the individual candidates would then be used to create the canonical form of the upper and lower bounds.
+
 
+
==IVersion and IVersionRange API==
+
 
+
Basically follows the current Version and VersionRange classes.
+
  
 
=Applicability=
 
=Applicability=
Line 412: Line 501:
 
* Publisher advice versions
 
* Publisher advice versions
  
=Implementation Steps=
 
Enablement of an alternate implementation of Version and VersionType is wanted in 3.5 even if the OSGi resolver is still used. This enablement is available as a patch.
 
On a parallell track - the Omni-version classes are implemented and tested with either a SAT4J based solution or an interims replacement of the OSGi resolver.
 
It is then possible to verify the functionality and performance of the Omni-version implementation.
 
  
The Omni-version implementation is not expected to go into 3.5 unless there is a SAT4j solution replacing the OSGi resolver, the functionality and performance is satisfactory, and enough testing has taken place.
 
  
The feature to allow introduction of new named formats can wait.
 
 
[[Category:Equinox p2|Version Type]]
 
[[Category:Equinox p2|Version Type]]
 +
 
=FAQ=
 
=FAQ=
 
'''Will users just using Eclipse and OSGi bundles be affected?'''<br/>
 
'''Will users just using Eclipse and OSGi bundles be affected?'''<br/>
No, users that only deal within the OSGi domain can continue to use version strings like before, there is no need to specify version formats. The Enablement that is proposed should also be safe as it is only a level of indirection to the current implementation of Version and VersionRange.
+
No, users that only deal within the OSGi domain can continue to use version strings like before, there is no need to specify version formats.  
  
 
'''How does a user of something know which version type to use? This seems very complicated...'''<br/>
 
'''How does a user of something know which version type to use? This seems very complicated...'''<br/>
To use some non-osgi component with p2, that component must have been made available in a p2 repository. When it was made available, the publisher must have made it available with a specified version format. The publisher must understand the component's version semantics. A consumer of the component can find the version format in the repository (the user must after all know that a capability is available under a certain name, and certain version range).
+
To use some non-osgi component with p2, that component must have been made available in a p2 repository. When it was made available, the publisher must have made it available with a specified version format. The publisher must understand the component's version semantics. A consumer that only wants to install the component does not really need to understand the format, and the original version string is probably sufficient. In scenarios where the consumer needs to know more - what to present is domain specific - some tool could show all non osgi version strings as "non-osgi" or "formatted" with drill down into the actual pattern (or if there is an alias registry available, it could reverse lookup the format).
  
 
'''Will open (osgi) ranges produce lots of false positives?'''<br/>
 
'''Will open (osgi) ranges produce lots of false positives?'''<br/>
Very unlikely. One decision to minimize the risk was to specify that integer segments are considered to be later than string segments.
+
Very unlikely. One decision to minimize the risk was to specify that integer segments are considered to be later than array and string segments.
 
(We also felt that version segments specified with integers are more "precise"). Note that to be included in the range, the required capability would still need to be in a matching name space, and have a matching name. To introduce a false positive, the publisher of the false positive would need to a) publish something already known to others (namespace and name) b) misinterpret how its versioning scheme works, and publishing it with a format of n.n.n.n (or n.n.n.s.<something>), c) having first learned how to actually specify such a format and how to publish it to a p2 repository and d) then persuaded users to use the repository.
 
(We also felt that version segments specified with integers are more "precise"). Note that to be included in the range, the required capability would still need to be in a matching name space, and have a matching name. To introduce a false positive, the publisher of the false positive would need to a) publish something already known to others (namespace and name) b) misinterpret how its versioning scheme works, and publishing it with a format of n.n.n.n (or n.n.n.s.<something>), c) having first learned how to actually specify such a format and how to publish it to a p2 repository and d) then persuaded users to use the repository.
  
 
'''What happens when a capability is available with several versioning schemes?'''<br/>
 
'''What happens when a capability is available with several versioning schemes?'''<br/>
A typical case would be some java package that is versioned at the source using triplet notation, and the same package is also made available using osgi notation (which is a mistake).
+
A typical case would be some java package that is versioned at the source using triplet notation, and the same package is also made available using osgi notation (which btw. is a mistake).
  
 
As an example, the following capabilities are found:
 
As an example, the following capabilities are found:
Line 470: Line 554:
  
 
'''What if the publisher of a component changes versioning scheme - what happens to ranges?'''<br/>
 
'''What if the publisher of a component changes versioning scheme - what happens to ranges?'''<br/>
The order among the versions will be correct as long as the versions are published using the correct notation. The only implication is that users must understand that a query for triplet:x.x.x means raw:x.x.x.maxs - e.g. osgi:[1.0.0,2.0.0] != triplet:[1.0.0,2.0.0] (osgi upper range of 2.0.0 would not match triplet published 2.0.0, and triplet lower range of 1.0.0 would not match osgi published 1.0.0).
+
The order among the versions will be correct as long as the versions are published using the correct notation. The only implication is that users must understand that a query for triplet:1.2.3 means raw:1.2.3.m - e.g. osgi:[1.0.0,2.0.0] != triplet:[1.0.0,2.0.0] (osgi upper range of 2.0.0 would not match triplet published 2.0.0, and triplet lower range of 1.0.0 would not match osgi published 1.0.0).
  
 
'''Why not use regexp instead of the special pattern format?'''<br/>
 
'''Why not use regexp instead of the special pattern format?'''<br/>
Line 477: Line 561:
 
'''Pattern parsing looks like it could have performance implications - what are the expectations here?'''<br/>
 
'''Pattern parsing looks like it could have performance implications - what are the expectations here?'''<br/>
 
The intention is to use a mechanism similar to reqular expressions - when a format is first seen it is compiled to an internal structure. The compiled structure is cached and reused for all subsequent occurrences of the same format. A test will be performed to compare current parsing of an OSGi version string with the pattern based parsing. Once parsed, all comparisons are made using the raw vector, which should be comparable in speed to the current implementation.
 
The intention is to use a mechanism similar to reqular expressions - when a format is first seen it is compiled to an internal structure. The compiled structure is cached and reused for all subsequent occurrences of the same format. A test will be performed to compare current parsing of an OSGi version string with the pattern based parsing. Once parsed, all comparisons are made using the raw vector, which should be comparable in speed to the current implementation.
 +
 +
Also note that the Engine does not have to parse and apply the format to the original string unless code explicetly asks for it, and this is not the normal case during provisioning.
  
 
'''Why not just let the publisher deal with transforming the version into canonical form?'''<br/>
 
'''Why not just let the publisher deal with transforming the version into canonical form?'''<br/>
There are several reasons:
+
The proposal allows this - the publisher is not required to make the format available. We think this is reasonable in domains where humans are not involved in the authoring (or the consumption).
 +
 
 +
There are several reasons why it is a good idea to include the original version string as well as the format:
 
* the original version strings needs to be kept as users would probably not understand the canonical representation in many cases.
 
* the original version strings needs to be kept as users would probably not understand the canonical representation in many cases.
 
* if the transformation pattern is not available a user would not be able to create a request without hand coding the canonical form
 
* if the transformation pattern is not available a user would not be able to create a request without hand coding the canonical form
Line 491: Line 579:
 
There are several reasons:
 
There are several reasons:
 
* this would mean that the version string would need to be preprocessed as it would not have \ embedded from the start
 
* this would mean that the version string would need to be preprocessed as it would not have \ embedded from the start
* all version strings that use \ as a delimiter would need to be pre-processed to escape the the \
+
* all version strings that use \ as a delimiter would need to be pre-processed to escape the \
 
* to date, authors of this proposal have not seen a version format that requires a mix of quotes
 
* to date, authors of this proposal have not seen a version format that requires a mix of quotes
 +
* In the unlikely event that such strings are present it is possible to concatenate several strings in the raw format.
 
* parsing performance is affected
 
* parsing performance is affected
 +
 +
'''Which format should I use?'''
 +
If you have the opportunity to select a versioning scheme - stick with OSGi.
  
 
=External Links=
 
=External Links=

Latest revision as of 07:56, 17 October 2009

This proposal is implemented and is part of p2 as of 3.5M5

Introduction

This page describes the "Omni Version" - an implementation of Version and VersionRange classes in Equinox p2 that enables p2 to handle other versioning schemes than OSGi. See bug 233699 for discussion.

Background

There are other versioning schemes in wide use that are not compatible with OSGi version and version ranges. The problem is both syntactic and semantic.

Example of semantic issues

Many open source projects do their versioning in a fashion similar to OSGi but with one very significant difference. For two versions that are otherwise equal, a lack of qualifier signifies a higher version then when a qualifier is present. I.e.

1.0.0.alpha 
1.0.0.beta
1.0.0.rc1
1.0.0

The 1.0.0 is the final release. The qualifier happens to be in alphabetical order here but that's not always true.

Mozilla Toolkit versioning has many rules and where each segment has 4 (optional slots; numeric, string, numeric, and string where each slot has a default value of being 0 or "max string" if missing).

1.2a3b.  // yes, a trailing . is allowed and means .0
1.a2

Mozilla also allows bumping the version (using an older Mozilla scheme)

1.0+ 

This means 1.1pre in mozilla.

Example of syntax issue

Here are some examples of versions used in Red Had Fedora distributions.

KDE Admin version 7:4.0.3-3.fc9
Compat libstdc version 33-3.2.3-63
Automake 1.4p6-15.fc7

And here are some mozilla toolkit versions:

1.*.1
1.0+
1.-1  // yes, negative integer version numbers are allowed, the - is not a delimiter
1.2a3b.a

These are not syntactically compatible with OSGi versions.

Current implementation in p2 3.5M5

The current implementation in p2 uses the OSGi resolver to create the final step of a provisioning plan. This means that versions that can not be converted to OSGi will cause the planner to stop with an error. This is expected to be fixed when a SAT4J based planner is used.

Solution

One implementation of Version and VersionRange

Equinox p2 has one implementation of Version and one of VersionRange (refered to as OmniVersion, and OmniVersionRange to describe that they are capable of capturing the semantics of various version formats). The advantages over previous proposed implementations are that there is no need to dynamically plugin new implementations, and new formats can be more easily be introduced.

Even if the finished solution only requires a single implementation (the OmniVersion discussed below), there are other factors to consider. The current p2 SimplePlanner uses the OSGi planner, and it can only understand OSGi versions. There is work being done on SAT4J to enable it being used instead of the OSGi planner (work to handle "explanations" could also be used to handle "attachments" (now being done with OSGi planner).

See bug 233699 for more information.

One Canonical Format

The OmniVersion and OmniVersion range are "universal" - all instances of Version should be comparable against each other with a fully defined (non ambiguous) ordering. The API is (as today) based on a single string fully describing a version or version range.

The canonical string format is called "raw" and it is explained in more detail below. To ensure backwards compatibility, as well as providing ease of use in an osgi environment, version strings that are not prefixed with an OmniVersion keyword (e.g. "raw") have the same format and semantics as the current osgi version format.

Ad an example the following two version strings are both valid input, and express exactly the same version:

1.0.0.r1234
raw:1.0.0.'r1234'

Implementation of Omni Version and VersionRange

OmniVersion

The OmniVersion implementation uses an vektor to store version-segements in order of descending significance. A segment is an instance of Integer, String, Comparable[], MaxInteger, MaxString, or Min.

Comparison

Comparison is done by iterating over segments from 0 to n.

  • If segments are of different type the rule MaxInteger > Integer > Comparable[] > MaxString > String is used - the comparison is done and the version with the greater segment type is reported as greater.
  • If segments are of equal type - they are compared - if one is greater the comparison is done and the version with the greater segment is reported as greater.
  • All versions are by default padded with -M (absolute min segment) "to infinity". A version may have an explicit pad element which is used instead of the default.
  • A shorter version is compared to a longer by comparing the extra segments in the longer version against the shorter version's pad segment.
  • If all segments are equal up to end of the longest segment array, the pad segments are compared, and the version with the greater pad segment is reported as greater.
  • If pad segments are also equal the two versions are reported as equal.
  • As a consequence of not including delimiters in the canonical format; two versions are equal if they only differ on delimiters.

As an example - here is a comparison of versions (expressed in the raw format introduced further on in the text - 'p' means that a pad element follows, and -M the absolute min segment):

 1p-M < 1.0.0 < 1.0.0p0 == 1p0 < 1.1 < 1.1.1 < 1p1 == 1.1p1 < 1pM

Raw and Original Version String

The original version should be kept when the raw version format is used, but it is not an absolute requirement as simple raw based forms such as raw:1.2.3.4.5 could certainly be used by humans. Someone (who for some reason does not want to use osgi or some other version scheme), could elect to use the raw format as their native format.

A version string with raw and original is written on the form:

  'raw' ':' raw-format-string '/' format(...):original-format-string

The p2 Engine completely ignores the original part - only the raw part is used, and the original format is only used for human consumption.

Example using a mozilla version string (as it has the most complex format encountered to date).

  raw:<1.m.0.m>.<20.'a'.3.'b'>p<0.m.0.m>/format((<n=0;?s=m;?n=0;?s=m;?>(.<n=0;?s=m;?n=0;?s=m;?>)*)=p<0.m.0.m>;):1.20a3b.a

An original version string can be included with unknown format:

  raw:<1.m.0.m>.<20.'a'.3.'b'>p<0.m.0.m>/:1.20a3b.a

See below for full explanation of the raw format.

OmniVersionRange

The OmniVersionRange holds two OmniVersion instances (lower and upper bound). A version range string uses the delimiters '[]', '()' and ','. If these characters are used in the lower or upper bound version strings, these occurrences must be escaped with '\' and occurrences of '\' must also be escaped.

The version range is either an osgi version range (if raw prefix is not used), or a raw range. The format of the raw range is:

  'raw' ':' ( '[' | '(' ) raw-format-string ',' raw-format-string ( ']' | ')' ) 

The raw-range can be followed by the original range:

  raw-range '/' 'format' '(' format-string ')' ':' ( '[' | '(' ) original-format-string ',' original-format-string ( ']' | ')' )

An original version range can be included with unknown format:

  raw: [<1.m.0.m>.<20.m.0.m>p<0.m.0.m>,<1.m.0.m>.<20.'a'.3.'b'>p<0.m.0.m>]/:[1.20,1.20a3b.a]

The p2 Engine completely ignores the original part - only the raw part is used, and the original format is only used for human consumption.

See below for full explanation of the raw format.

Other range formats

Note that some version schemes have range concepts where the notion of inclusive or exclusive does not exist, and instead use symbolic markers such as "next larger", "next smaller", or use wildcards to define ranges. In these cases, the publisher of an IU must use discrete versions and the inclusive/exclusive notation to define the same range.

Some range specifications allows the specification of union, or exclusion of certain versions. This is not yet supported by p2. If introduced it could be expressed as a series of ranges where a ^ before a range negates it. Example [0,1][3,10]^[3.1,3.7) equivalent to [0,10]^(1,3)^[3.1,3.7)

Format Specification

There are two basic formats default osgi string format, and raw canonical string format. There are also two corresponding range formats osgi-version-range, and raw-version-range.

The raw format is a string representation of the internally used format - it consists of the keyword "raw", followed by a list of entries separated by period. An entry can be numerical, quoted alphanumerical, or a sub canonical list on the same format. A canonical version (and sub canoncial version arrays) can be padded to infinity with a special padding element. Special entries express the notion of 'max integer' and 'max string'.

The osgi string format is the well known format in current use.

The raw format in BNF:

   digit: [0-9];
   letter: [a-zA-Z];
   numeric : digit+;
   alpha : letter+;
   alpha-numeric : [0-9a-zA-Z]+;
   delimiter: [^0-9a-zA-Z];
   character: .;
   characters .+;
   quoted-string: ("[^"]*")|('[^']*');  // i.e a sequence of charactes quoted with " or ', where ' can be used in a " quoted string and vice versa
   range-safe-string:  TBD; // a sequence of any characters but with ',' ']', ')' and '\' escaped with '\';
   sq: ['];
   dq: ["];

   version :
      | osgi-version
      | raw-version
      ;
   osgi-version :
      | numeric
      | numeric '.' numeric
      | numeric '.' numeric '.' numeric
      | numeric '.' numeric '.' numeric '.' .+
      ;
   raw-version : 
      | 'raw' ':' raw-segments optional-original-version
      ;
   optional-original-version :
      |
      | '/' original-version
      ;
   version-range : 
      | osgi-version-range
      | raw-version-range
      ;
   rs : ('[' | '(') ;
   re : (']' | ')') ;

   osgi-version-range : 
      | rs osgi-version ',' osgi-version re
      ;
   raw-version-range : 
      | 'raw' ':' rs raw-segments ',' raw-segments re optional-original-range
      ;
   optional-original-range :
      | 
      | '/' original-range
      ;

   raw-segments : 
      | raw-elements optional-pad-element
      ;
   raw-elements :
      | raw-elements '.' raw-element
      | raw-element
      ;
   raw-element :
      | numeric
      | quoted-strings  // strings are concatenated
      | '<' raw-elements optional-pad-element '>'   // subvector of elements
      | 'm'   // symbolic 'maxs' == max string 
      | 'M'   // symbolic 'absolute max' i.e. max > MAX_INT > maxs
      | '-M // symbolic 'absolute min' i.e. -M <  empty string < array <  int
      ;
   optional-pad-element :
      |
      | pad-element 
      ;
   quoted-strings :
      | quoted-strings quoted-string
      | quoted-string
      ;
   pad-element :
      | 'p' raw-element
      ;

   original-version :
      | optional-format-definition ':' .*
      ;
   original-range :
      | optional-format-definition ':' rs range-safe-string ',' range-safe-string re
      ;
   optional-format-definition :
      | 
      | format-definition
      ;
   format-definition :
      | 'format' '(' pattern ')'
      ;

   // Definition of parsing patterns
   //
   pattern :
      | pattern pattern-element
      | pattern-element 
      ;
   pattern-element :
      | pelem optional-processing-rules optional-pattern-range
      | '[' pattern ']' processing-rules
      ;
   optional-processing-rules :
      | optional- processing-rules '=' processing-rule ';'
      | '=' processing-rule ';'
      | 
      ;
   optional-pattern-range :
      | repeat-range
      | 
      ;

   pelem 
      | 'r' | 'd' | 'p' | 'a' | 's' | 'S' |  'n' | 'N' | 'q'
      | '(' pattern ')'
      | '<' pattern '>'
      | delimiter
      ;
   repeat-range : 
      | '?' | '*' | '+'
      | '{' exact '}'
      | '{' at-least ',' '}'
      | '{' at-least ',' at-most '}'
      ;

   exact : at-least : at-most : numeric ;

   processing-rule :
      | raw-element
      | pad-element
      | '!' 
      | '[' char-list ']'
      | '[' '^' char-list ']'
      | '{' exact '}'   // for character count
      | '{' at-least ',' '}'
      | '{' at-least ',' at-most '}'
      ;
   char-list: TBD ; // Sequence of any character but with '^', ']' and '\' escaped with '\' 
   delimiter : 
      | [!#$%&/=^,.;:-_ ] // Any non-alpha-num that has no special meaning
      | quoted-string
      | '\' .  // any escaped character
      ;
   


Examples:

  • OSGi 1.0.0.r1234 is expressed as raw:1.0.0.'r1234'
  • apache/triplet style 1.2.3 is expressed as raw:1.2.3.m
  • mozilla style 1a.2a3c. can be expressed as raw:<1.'a'.0.m>.<2.'a'.3.'c'>p<0.m.0.m> (mozilla is a complex format - see external links at the end of page).

Format Pattern Explanation

Here are explanations for the rules in format(pattern).

rule description
r raw - matches one raw-element as specified by the raw format. The 'r' rule does not match a pad element - use 'p' for this.
'characters' matches a single character or sequence of characters - the matched result is not included in the resulting canonical vector (i.e. it is not a segment). A '\\' is needed to include a single '\'. The sequence of chars acts as one delimiter.
non-alphanum character matches any non alpha-numerical character (including space) - the matched result is not included in the canonical vector (i.e. it is not a segment). A non alphanumerical character acts as a delimiter. Special characters must be escaped when wanted as delimiters.
a auto - a sequence of digits creates a numeric segment, a sequence of alphabetical characters creates a string segment. Segments are delimited by any character not having the same character class as the first character in the sequence, or by the following delimiter. A numerical sequence ignores leading zeros.
d delimiter; matches any non alpha-numeric character. The matched result is not included in the resulting canonical vector (i.e. it is not a segment).
s a string group matching only alpha characters (i.e. "letters"). Use processing rules =[]; or =[^] to define the set of allowed characters. It is possible to allow inclusion of delimiter chars, but not inclusion of digits.
S a string group matching any group of characters. Use processing rules =[]; or =[^] to define the set of allowed characters. Care must be taken to specify exclusion of a delimiter if elements are to follow the 'S'.
n a numeric (integer) group with value >= 0. Leading zeros are ignored.
N a possibly negative value numeric (integer) group. Leading zeros are ignored.
p parses an explicit pad-element in the input string as defined by the raw format. To define an implicit pad as part of the pattern use the processing instruction =p...;. A pad element can only be last in the overall version string, or last in a sub array.
q smart quoted string - matches a quoted alphanumeric string where the quote is determined by the first character of the string segment. The quote must be a non alphanumeric character, and the string must be delimited by the same character except brackets and parenthesises (i.e. (), {}, [], <>) which are handled as pairs, thus 'q' matches "<andrea-doria>" and produces a single string segment with the text 'andrea-doria'. A non-quoted sequence of characters are not matched by 'q'.
() indicates a group
< > indicates a group, where the resulting elements of the group is placed in an array, and the array is one resulting element in the enclosing result
? zero to one occurrence of the preceding rule
* zero to many occurrences of the preceding rule
+ one to many occurrences of the preceding rule
{n} exactly n occurrences of the preceding rule
{n,} at least n occurrences of the preceding rule
{n,m} at least n occurrences of the preceding rule, but not more than m times
[ ] short hand notation for an optional group. Is equivalent to ()?
=processing; an additional processing rule is applied to the preceding rule. The processing part can be:
  • raw-element - use this raw-element (as defined by the raw format) as the default value if input is missing. The default value does not have to be of the same type (e.g. "s=123;?" produces an integer segment of value 123 if the optional s is not matched.
  • ! - if input is present do not turn it into a segment (i.e. ignore what was matched)
  • [<list of chars>] - when applied to a 'd' defines the set of delimiters. The characters ], ^, and \ must be escaped with \ to be used in the list of chars. and Example d=[+-/]; One or several ranges of characters such as "a-z" can also be used. Example d=[a-zA-Z0-9_-];
  • [^<list of chars>] - when applied to a 'd' defines the set of delimiters to be all non alpha numeric except the listed characters. The characters ], ^, and \ must be escaped with \ to be used in the list of chars. One or several ranges of characters such as "a-z" can also be used. Example d=[^$]
  • praw-element - defines "padding to infinity with specified raw-element" when applied to an array, or a group enclosing the entire format. Example format((n.s)=pM;) The pad processing rule is only applied to a parsed array, not to a default value for an array. If padding is wanted in the default array value, it can be expressed explicitly in the default value.
  • {n} {n,} {n,m} character ranges - with the same meaning as the rules with the same syntax, but limits the range in characters matched in the preceding 's', 'S', 'n', 'N', 'q', or 'a' rules. For 'q' the quotes does not count.
\ escape removes the special meaning of a character and must be used if a special character is wanted as a delimiter. A '\\' is needed to include a '\'. Escaping a non special character is superflous but allowed.

Additional rules:

  • if a rule produces a null segment, it is not placed in the result vector e.g. format(ndddn):10-/-12 => raw:10.12
  • Processing (i.e. default values) applied to a group has higher precedence than individual processing inside the group if the entire group was not successfully matched.
  • Parsing is greedy - format(n(.n)*(.s)*) will interpret 1.2.3.hello as raw:1.2.3.'hello' (as opposed to being reluctant which would produce raw:1.'2'.'3'.'hello')
  • When combining N with ={...}; and the input has a negative number, the "-" is included in the character count - "format(N{3}N{2}):-1234" results in "raw:-123.4"
  • When combining n or N with ={...} and input has leading zeros - these are included in the character count.
  • An empty version strings is always considered to be an error.
  • A format that produces no segments is always considered to be an error.

Note about white space in the raw format:

  • white space is accepted inside quoted strings - i.e. "1.'a string'" is allowed, but not "1. 2"
  • white space is accepted between version range delimiters and version strings - i.e. [ 1.0, 2.0 ] is allowed.

Note about timestamps An earlier proposal had a 't' rule, but this rule has been deprecated because of the complexity. Instead, the creator of an IU should simply use 's' or 'n' and ensure comparability by using a fixed number of characters when choosing 's' format.

Examples of Version Formats

Here are examples of various version formats expressed as using the format pattern notation. The examples also show a proposed notation of using aliases for formats. (See the section 'Tooling Support')

type name pattern comment
osgi n[.n=0;[.n=0;[.S=[a-zA-Z0-9_-];]]] Example: the following are equivalent:
  • format(n[.n=0;[.n=0;[.S=[a-zA-Z0-9_-];]]]):1.0.0.r1234
  • raw:1.0.0.'r1234'
  • osgi:1.0.0.r1234
  • 1.0.0.r1234
triplet n[.n=0;[.n=0;]][d?S=M;] A variation on OSGi, with the same syntax, but where the a lack of qualifier > any qualifier, and the qualifier may contain any character. The following are all equivalent:
  • format(n[.n=0;[.n=0;]][d?S=M;]):1.0.0
  • raw:1.0.0.M
  • triplet:1.0.0
jsr277 n(.n=0;){0,3}[-S=m;] As defined by JSR 277 - but is provisional and subject to change as it is expected that compatibility with OSGi will be solved (they are now incompatible because of the fourth numeric field with default value 0). The jsr277 format is similar to triplet, but with 4 numeric segments and a '-' separating the qualifier to allow input of "1-qualifier" to mean "1.0.0.0-qualifier". As in triplet the a lack of qualifier > any qualifier. The following are all equivalent:
  • format(n(.n=0;){1,3}[-S=m;]):1.0.0
  • raw:1.0.0.0.M
  • jsr277:1.0.0
tripletSnapshot n[.n=0;[.n=0;[-n=M;.S=m;]]] Format used when maven transforms versions like 1.2.3-SNAPSHOT into 1.2.3-<buildnumber>.<timestamp> ensuring that it is compatible with triplet format if missing <buildnumber>.<timestamp> at the end (format produces max, max-string if they are missing).

Example: the following are equivalent:

  • format(n[.n=0;[.n=0;[-n=M;.S=m;]]]):1.2.3-45.20081213:1233
  • raw:1.2.3.45.'20081213:1233'
  • tripletSnapshot:1.2.3-45.20081213:1233
rpm <[n:]a(d?a)*>[-n[dS=!;]] RPM format matches [EPOCH:]VERSION-STRING[-PACKAGE-VERSION], where epoch is optional and numeric, version-string is auto matched to arbitrary depth >= 1, followed by a package-version, which consists of a buildnumber separated by any separator from trailing platform specification, or the string 'src' to indicate that the package is a souce package. This format allows the platform and src part to be included in the version string, but if present it is not used in the comparisons. The platform type vs source is expected to be encoded elsewhere in such an IU. Everything except the build-number is placed in an array as build number is only compared if there is a tie.

An example of equivalent expressions:

  • format(<[n:]a(d?a)*>[-n[dS=!;]]):33:1.2.3a-23/i386
  • raw:<33.1.2.3.'a'>.23
mozilla (<n=0;?s=m;?n=0;?s=m;?>(.<n=0;?s=m;?n=0;?s=m;?>)*)=p<0.m.0.m>; Mozilla versions are somewhat complicated, it consists of 1 or more parts separated by period. Each part consists of 4 optional 'fragments' (numeric, string, numeric,string), where numeric fragments are 0 if missing, and string fragments are MAX-STRING if missing. The versions use padding so that 1 == 1.0 == 1.0.0 == 1.0.0.0 etc.
string S Perhaps superflous, but makes this version format appear in a selectable list of formats.
auto a(d?a)* Perhaps superflous, but makes this version format appear in a selectable list of formats, and it serves like a "catch all".

Tooling Support

The OmniVersion is not designed to be extended. Earlier we proposed that it should be possible to define named aliases for common formats and that these formats should be parse-able by the OmniVersion parser. The reasons for introducing alias was to make it possible for users to enter something like "triplet:1.0.0" instead of entering the more complicated format. This did however raise a lot of questions: Who can define an alias, what if the definition of the alias is changed, where are the alias definitions found. Is it possible to work at all with a version that is using only an alias - what if I want to modify a range and do not have access to the alias?

We instead propose that alias handling is a tooling concern. Tooling should keep a registry of known formats. When a version is to be presented, the format string is "reverse looked up" in the registry - and the alias name can be presented instead of the actual format. This way, the version is always self describing. There is still the need to get "well known formats" and make them available in order to make it easier to use non OSGi versions in publishing tools - but there is no absolute requirement to support this in all publishing tools (some may even operate in a domain where version format is implied by the domain) - and there is no "breakage" because an alias is missing.

Tooling support can be as simple as just having preferences where formats are associated with names - the user can enter new formats and aliases. Some import mechanism is probably also nice to have. Further ideas could be that aliases can be published as IU's and installed (i.e install a preference).

Existing Tooling should naturally use the new OmniVersion implementation to parse strings - thus enabling a user to enter a version in raw or format() form. An implementation can choose to present the full version string (i.e. OmniVersion.toString()), or only the original version.

More examples using 'format'

A version range with format equivalent to OSGi

format(n[.n=0;[.n=0;[.S=[a-zA-Z0-9_-];]]]):[1.0.0.r12345, 2.0.0]

At least one string, and max 5 strings

format(S=[^.][.S=[^.];[.S=[^.][.S=[^.][.S=[^.]]]]]):vivaldi.opus.spring.bar5
format(S=[^.](.S=[^.]){0,4}):vivaldi.opus.spring.bar5  => 'vivaldi'.'opus'.'spring'.'bar5'

At least one alpha or numerical with auto format and delimiter

format(a(d?a)*):vivaldi:opus23-spring.bar5  => 'vivaldi'.'opus'.23.'spring'.'bar'.5

The texts 'opus' and 'bar' should not be included:

format(s[.'opus'n[.'bar'n]]):vivaldi.opus23.bar8   => 'vivaldi'.23.8

The first string segment should be ignored - it is a marketing name:

format(s=!;.n(.n)*):vivaldi.1.5.3

Classic SCCS/RCS style:

format(n(.n)*):1.1.1.1.1.1.1.4.5.6.7.8

Max depth 8 of numerical segments (limited classic SCCS/RCS type versions):

format(n(.n){0,7}):1.1.1.1.1.1.1.4

Numeric to optional depth 8, where missing input is set to 0, followed by optional string where 'emtpy > any'

format(n(d?n=0;){0,7}[a=M;]):1.1.1.4:beta   => 1.1.1.4.0.0.0.0.'beta'
format(n(d?n=0;){0,7}[a=M;]):1.1.1.4   => 1.1.1.4.0.0.0.0.M

Single string range

format(S):[andrea doria,titanic]

Range examples

Examples:

  • raw:[1.2.3.'r1234',2.0.0]
  • [1.2.3.r1234,2.0.0]
  • format(a+):[monkey.fred.ate.5.bananas,monkey.fred.ate.10.oranges]
  • [1.0.0,2.0.0] equal to osgi:[1.0.0,2.0.0]
  • format(S):[andrea doria,titanic]
  • rpm:[7:4.0.3-3.fc9,8:1] - an example KDE Admin version 7:4.0.3-3.fc9 to 8:1
  • triplet:[1.0.0.RC1,1.0.0]

Internationalization

Alphanumerical segments use vanilla string comparison as internationalization (lexical ordering/collation) would produce different results for different users.

Applicability

The generalization of version type applies to objects that by nature may have a different versioning scheme than OSGi. This includes:

  • Installable Unit
  • Provided Capability
  • Required Capability
  • Artifact key

These does not need to be generalized:

  • File format version numbers (content.xml, artifact.xml, etc)
  • Update Descriptor
  • Touchpoint version numbers and touchpoint action versions
  • Publisher advice versions

FAQ

Will users just using Eclipse and OSGi bundles be affected?
No, users that only deal within the OSGi domain can continue to use version strings like before, there is no need to specify version formats.

How does a user of something know which version type to use? This seems very complicated...
To use some non-osgi component with p2, that component must have been made available in a p2 repository. When it was made available, the publisher must have made it available with a specified version format. The publisher must understand the component's version semantics. A consumer that only wants to install the component does not really need to understand the format, and the original version string is probably sufficient. In scenarios where the consumer needs to know more - what to present is domain specific - some tool could show all non osgi version strings as "non-osgi" or "formatted" with drill down into the actual pattern (or if there is an alias registry available, it could reverse lookup the format).

Will open (osgi) ranges produce lots of false positives?
Very unlikely. One decision to minimize the risk was to specify that integer segments are considered to be later than array and string segments. (We also felt that version segments specified with integers are more "precise"). Note that to be included in the range, the required capability would still need to be in a matching name space, and have a matching name. To introduce a false positive, the publisher of the false positive would need to a) publish something already known to others (namespace and name) b) misinterpret how its versioning scheme works, and publishing it with a format of n.n.n.n (or n.n.n.s.<something>), c) having first learned how to actually specify such a format and how to publish it to a p2 repository and d) then persuaded users to use the repository.

What happens when a capability is available with several versioning schemes?
A typical case would be some java package that is versioned at the source using triplet notation, and the same package is also made available using osgi notation (which btw. is a mistake).

As an example, the following capabilities are found:

  • org.demo.ships triplet:2.0.0
  • org.demo.ships triplet:2.0.0.RC1
  • org.demo.ships osgi:2.0.0
  • org.demo.ships osgi:2.0.0.RC1

(Reminder: in triplet notation 2.0.0.RC1 is older than 2.0.0).

The raw versions will then look like this:

  • 2.0.0.m
  • 2.0.0.'RC1'
  • 2.0.0
  • 2.0.0.'RC1'

And the newest is 2.0.0.m (which is correct for both OSGi, and triplet). When specifying a range, the outcome may depend on if the range is specified with osgi or triplet notation.

  • osgi:[1.0.0,2.0.0] == raw:[1.0.0, 2.0.0] => matches the osgi:2.0.0 version only
  • triplet:[1.0.0,2.0.0] == raw:[1.0.0.m,2.0.0.m] => matches all the versions, and picks 2.0.0.m as it is the latest.

i.e. result is correct (assuming the bits are identical as different artifacts would be picked)

Now look at the lower boundary, and assume that the following versions are the (only) available:

  • org.demo.ships triplet: 1.0.0 == raw: 1.0.0.m
  • org.demo.ships triplet: 1.0.0.RC1 == raw:1.0.0.'RC1'
  • org.demo.ships osgi: 1.0.0 == raw:1.0.0
  • org.demo.ships osgi:1.0.0.RC1 == raw:1.0.0.'RC1'

When specifying ranges:

  • osgi:[1.0.0,2.0.0] == raw:[1.0.0, 2.0.0] => matches all the version, and picks 1.0.0.maxs as this is the newest
  • triplet:[1.0.0,2.0.0] == raw:[1.0.0.m,2.0.0.m] results in 1.0.0.m as it is the only available version that matches.

i.e. the result is correct and here the exact same version is picked.

The "worst osgi/triplet crime" that can be committed is publishing an unqualified triplet version as an osgi version (if the same version is not also available as a triplet) as this would make that version older than what it is even when queried using a triplet range.

What if the publisher of a component changes versioning scheme - what happens to ranges?
The order among the versions will be correct as long as the versions are published using the correct notation. The only implication is that users must understand that a query for triplet:1.2.3 means raw:1.2.3.m - e.g. osgi:[1.0.0,2.0.0] != triplet:[1.0.0,2.0.0] (osgi upper range of 2.0.0 would not match triplet published 2.0.0, and triplet lower range of 1.0.0 would not match osgi published 1.0.0).

Why not use regexp instead of the special pattern format?
This was first considered, and would certainly work if the pattern notation was augmented with processing instructions, or if the regexp is specified as a substitution that produces the raw format. Such specifications would typically be much longer and more difficult for humans to read than the proposed format, except possibly for regexp experts :). Another immediate problem is that regexp breaks the current API requirement. It is not included in execution environment CDC-1.1/Foundation-1.1 required by p2.

Pattern parsing looks like it could have performance implications - what are the expectations here?
The intention is to use a mechanism similar to reqular expressions - when a format is first seen it is compiled to an internal structure. The compiled structure is cached and reused for all subsequent occurrences of the same format. A test will be performed to compare current parsing of an OSGi version string with the pattern based parsing. Once parsed, all comparisons are made using the raw vector, which should be comparable in speed to the current implementation.

Also note that the Engine does not have to parse and apply the format to the original string unless code explicetly asks for it, and this is not the normal case during provisioning.

Why not just let the publisher deal with transforming the version into canonical form?
The proposal allows this - the publisher is not required to make the format available. We think this is reasonable in domains where humans are not involved in the authoring (or the consumption).

There are several reasons why it is a good idea to include the original version string as well as the format:

  • the original version strings needs to be kept as users would probably not understand the canonical representation in many cases.
  • if the transformation pattern is not available a user would not be able to create a request without hand coding the canonical form
  • making the transformation logic used by one publisher available to others would mean that all publishers must have extensions that allow plugging in such logic, and the plugins must be made available

Would it be possible to use the current OSGi version as the canonical form?
The long answer is: To be general, the encoding would need to be made in the qualifier string part of the OSGi version. An upper length for segments must be imposed, numerical sections must be left padded with "0" to that length, and string segments must be right padded with space (else string segment parts may overlap integer segments parts). The selected segment length would need to be big enough to allow the longest anticipated string segment. A fixed length string representation of MAX must be invented. A different implementation would still be needed to be able to keep the original version strings.
The short answer is: no.

Why not use an escape in string segments to be able to have strings with a mix of quotes? There are several reasons:

  • this would mean that the version string would need to be preprocessed as it would not have \ embedded from the start
  • all version strings that use \ as a delimiter would need to be pre-processed to escape the \
  • to date, authors of this proposal have not seen a version format that requires a mix of quotes
  • In the unlikely event that such strings are present it is possible to concatenate several strings in the raw format.
  • parsing performance is affected

Which format should I use? If you have the opportunity to select a versioning scheme - stick with OSGi.

External Links

Back to the top