Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Equinox/p2/Query Language for p2"

< Equinox‎ | p2
(Constructors)
(Collection functions)
Line 143: Line 143:
 
<li>'''Reject'''<br/>
 
<li>'''Reject'''<br/>
 
The '''reject''' function is similar to '''select''' but instead, returns the elements for which the evaluation result was ''false''.
 
The '''reject''' function is similar to '''select''' but instead, returns the elements for which the evaluation result was ''false''.
 +
</li>
 +
<li>'''Collect'''<br/>
 +
The '''collect''' collects the result of evaluating each element in a new collection. Example:
 +
<pre>collect(x | x.id)</pre>
 +
returns a collection consisting of the value of the id property of each element in the original collection.
 
</li>
 
</li>
 
<li>'''Exists'''<br/>
 
<li>'''Exists'''<br/>

Revision as of 12:19, 30 November 2009

Background

As p2 gets more widely used, we predict that repositories will grow larger and larger over time. The Galileo repository alone contains 3866 IU's (in one single service release). Starting with Helios, we plan to keep everything over time which will put the IU count close to 10.000. And that's just one year of Eclipse IP approved material. OSGi is gaining in popularity and p2 is getting increasingly popular. Some companies plan to map the central maven repository as p2. We can soon expect to see p2 repositories with over a 100.000 IU's in them.

The problem

p2 has a query mechanism today that makes it hard to create a repository implementation that is based on a database. It is also diffiult to create an efficient client/server solution. The reason for this is that most of the queries are written as implementations of the IQuery interface, either as ContextQuery derivates that need access to all rows in the database, or as MatchQuery derivates with a java method to which all rows need to be passed.

There is no uniform way to translate these queries into an expression that can be understood by a database or by something running in another process.

Some attempts has been made to rectify this and a repository implementation could potentially make some intelligent decisions based on information derived from looking at the query (what class it is, and in some cases, what properties that are exposed). While this is an improvement, the general problem remains; A large amount of the queries are still more or less completely black-boxed and it's enough with one single such query to force a full selection of everything from the database/remote server.

The p2QL expression language discussed here is an attempt to rectify this problem. It can be used in two ways:

  • Coexisting with current IQuery / IQueryable
    Using the ExpressionQuery (a ContextQuery subclass), or the PredicateQuery (a MatchQuery subclass), it can coexist with todays IQuery/IQueryable. The queries are created based on an expression and an array of parameters. See section #IQuery examples.
  • With more explicit separation of concerns
    It can also be used in an alternative solution where an expression returns an Iterator. This usage scenario is particularly useful when implementing remote repositories or implementations on top of a database. It is also a very efficient way of querying for things that you don't want to collect in memory.

Bugzilla

We discussed the use of a query language at length on the p2 meeting on November 9 and November 16. The issue is captured in bugzilla Create a QueryLanguage for p2.

Language Design

TODO: Add some text here to explain where the ideas stem from (xtend, xtext, Scala, Java) and other motivation behind the choice of syntax.

Predicate Queries

Queries can be written as simple predicates such as "id == $0". When executed, the predicate will be evaluated once for each row in the queried material. This form is used for the IMatchQuery implementation. The current value is available in a predefined variable named item. Normally, item does not need to be specified so that:

id == $0

is actually the same as:

item.id == $0

Context Queries

A query can also be declared as something that acts on "everything". This is particularly useful when things like search for latest or doing requirements traversal. These queries make use of a predefined variable named everything. When accessed, that variable returns an iterator over all queried material. As with item in a predicate query, everything is something that you don't normally need to specify, i.e.:

select(x | x.id == $0)

is the same as:

everything.select(x | x.id = $0)

A predicate query can always be written as a context query. These two queries are fully equivalent:

Predicate: id == $0
Context:   select(x | x.id = $0)

Special operators

  • Matches operator '~='

    A large amount of queries involve versions, version ranges, and capability matching. So managing that is important. Another thing that is important is to be able to support filtering based on ID's. All of this is now also built in to the language through the matches operator ~=. It can be though of as the 'satisfies' method on an installable unit or provided capability, as 'is included in' when used with a version and version range, as a plain matches when used with a string and a pattern, or as 'instanceof' when comparing with a class.

    The following things can be matched using the operator

    LHS RHS Implemented as
    IProvidedCapability IRequiredCapability lhs.satisfies(rhs)
    IInstallableUnit IRequiredCapability lhs.satisfies(rhs)
    Version VersionRange rhs.isIncluded(lhs)
    IInstallableUnit Filter rhs.matches(lhs.properties)
    IInstallableUnit IUpdateDescriptor rhs.isUpdateOf(lhs)
    Map Filter rhs.match(lhs)
    String Pattern rhs.matcher(lhs).matches()
    <any> Class rhs.isInstance(lhs)
    Class Class rhs.isAssignableFrom(lhs)
  • And operator '&&'
    This operator checks if the first operand evaluates to a boolean. If it is, the it is assumed that all other operands also are booleans and the full evaluation is true if all its operands evaluate to true. If the first result was not a boolean, then it is assumed that it is a collection and that all other operands also evaluates collections. The operator will then function as a intersect operator and the result consists those elements that could be found in all collections.
  • Or operator '||'
    This operator checks if the first operand evaluates to a boolean. If it is, the it is assumed that all other operands also are booleans and the full evaluation is false if none of its operands evaluate to true. If the first result was not a boolean, then it is assumed that it is a collection and that all other operands also evaluates collections. The operator will then function as a union operator and the result is the unique sum of all elements from all collections.

Functions

A small number of functions are available to assist with some common problems. Example: An IRequiredCapability.getFilter() returns a String. I only want the capabilities that has a filter that match certain properties. Consequently, I need an instance of an OSGi Filter. Not a string. I can do this using the filter function. Like this:

requiredCapabilities.exists(rc | rc.filter == null || $1 ~= filter(rc.filter))

The currently available constructors are:

name arguments creates
filter string an instance of org.osgi.framework.Filter
version string a p2 Version
range string a p2 VersionRange
class string a Class
set comma separated list of expressions an instance of java.util.HashSet
iquery IQuery [, variable [, collector]] The result of evaluating the query
localizedProperty locale, installableUnit, key A localized property for the given IU. See discussion below on #Localization

A note about the iquery constructor.
The constructor operates in one of two modes depending on its IQuery argument.

If the argument implements IMatchQuery, the the constructor will return the boolean result of invoking the isMatch(value) method on that query. The value is picked from the variable and the default variable is 'item'. It is considered an error to pass a third argument in combination with an IMatchQuery.

When the query does not implement the IMatchQuery interface, it is considered a context query and its method perform(iterator, collector) will be called. The iterator is picked from the variable and the default variable is 'everything'. The collector can be passed in as the third argument. If no third argument is present, a new collector will be created.

Collection functions

A number of functions was added to enable various manipulations of collections. A common way of writing collection functions is:

elements.someFunction(element | <expression>)

Here are the functions that are available in the p2QL language:

  • Select
    The select function will evaluate its expression, once for each element, and return a new collection containing all elements for which the evaluation result was true. Example:
    select(x | x.id == 'org.eclipse.equinox.p2.ql')
  • Reject
    The reject function is similar to select but instead, returns the elements for which the evaluation result was false.
  • Collect
    The collect collects the result of evaluating each element in a new collection. Example:
    collect(x | x.id)

    returns a collection consisting of the value of the id property of each element in the original collection.

  • Exists
    The exists function will evaluate an expression for each element in the collection until an evaluation yields true. If that happens, the result of the whole evaluation yields true. If no such element is found, the whole evaluation yields false. Example:
    providedCapabilities.exists(p | p ~= $0)
  • All
    The all function will evaluate an expression for each element until an evaluation yields false. If that happens, the whole evaluation yields false. If no such element is found, the whole evaluation yields true. Example:
    $0.all(rc | item ~= rc)
    Assuming $0 is a list of required capabilities, this query asks for items that fulfill all requirements.
  • Latest
    Some queries must make sure that the result only contains the latest version of each found IU. The special function latest to makes that possible. The function will require that the collection elements implement the IVersionedId interface. Here is an example of how to use it:
    select(x | x.id = $0).latest()

    As a convenience, it is also possible to write:

    latest(x | x.id = $0)
  • Limit
    It is sometimes desirable to limit the number of rows returned by a query. The function 'limit' can be used for that:
    select(...).limit(100)
  • Unique
    This function will ensure that the resulting collection contains unique elements. The function can operate in two modes, with or without a cache argument. I.e.
    x.unique()

    or

    x.unique(cache)
    The latter expects the argument to be a java.util.Set that it can use to enforce the uniqueness of the element. This enables the result to be unique in a larger scope then the collection itself.
  • Traverse
    A common scenario in p2 is that you want to start with a set of roots and then find all items that fulfill the root requirements. Those items in turn introduce new requirements so you want to find them too. The process continues until no more requirements can be satisfied. This type of query can be performed using the traverse function. The function will evaluate an expression, once for each element, collect elements for which the evaluation returned true, then then re-evaluate using the collected result as source of elements. No element is evaluated twice. This continues until no more elements are found:
    $0.traverse(parent | select(child | parent.requiredCapabilities.exists(rc | child ~= rc)))
    This is of course a fairly naive slicing mechanism. See #Currying below for more advanced examples.

Query parameters

The query must be parameterised so that expression parsing can be done once and then executed multiple times. A parameter is expressed as $n where n is either the parameter index, originating from 0, or an identifier. When identifiers are used, the parameters should be passed in a Map.

Basic IQuery examples

Here are some examples of how to use the expressions with IQuery: Query for all IU's that has an id:

IQuery query = new PredicateQuery("id = $0", id);

Query for the latest IU of some specific id:

IQuery query = new ExpressionQuery("latest(x | x.id = $0)", id);

Query an artifact repository for all keys with a specific classifier:

IQuery query = new PredicateQuery(IArtifactKey.class, "classifier = $0", classifier);

Query for the latest IU that matches a specific version range. Since the second parameter is a VersionRange, the ~= operator is interpreted as isIncluded:

IQuery query = new ExpressionQuery("latest(x | x.id = $0 && x.version ~= $1)", id, range);

Query for an IU that has a specific property set:

IQuery query = new PredicateQuery("properties[$0] = $1", key, value);

The same query, but this time for multiple possible values:

IQuery query = new PredicateQuery("$1.exists(v | properties[$0] = v)", key, new Object[] { v1, v2, v3 });

Query for all categories found in the repository:

IQuery query = new PredicateQuery("properties[$0] = 'true'", IInstallableUnit.PROP_TYPE_CATEGORY);

Query for all IU's that fulfil at least one of the requirements from another IU. Since the first parameter is a list of IRequiredCapabilites, the ~= applied to each combination of provided / required capability is intepreted as satisfies.

IQuery query = new PredicateQuery("$0.exists(rc | providedCapabilities.exists(pc ~= rc))", iu.getRequiredCapabilities());
OR
IQuery query = new PredicateQuery("$0.exists(rc | item ~= rc)", iu.getRequiredCapabilities());

Query for the latest version of all patches:

IQuery query = new ExpressionQuery("latest(x | x ~= $0)", IInstallableUnitPatch.class);

Query for all IU's affected by a patch:

IQuery query = new PredicateQuery("$0.exists(rcs | rcs.all(rc | item ~= rc)))", patch.getApplicabilityScope());

Currying

Want some add some spice? Probably...

Consider the traversal function and the example we had above:

$0.traverse(parent | select(child | parent.requiredCapabilities.exists(rc | child ~= rc)))

Now let us assume that we want to perform this traversal with filtering of the required capabilities. Each capability has an optional filter that determines if the requirement is active or not. The filter of an IRequiredCapability is in string form so the filter(<expr>) constructor comes in handy here.

Our first attempt looks like this

$0.traverse(parent | select(child |
  parent.requiredCapabilities.exists(rc | (rc.filter == null || $1 ~= filter(rc.filter)) && child ~= rc)))
The intended arguments are a $0 = collection of InstallableUnits (the roots) and $1 = a Map with settings to be matched against the filter (typically osgi.os, osgi.ws, osgi.arch, osgi.nl).

The execusion of this query might be a bit slow due to the fact that the valid required capabilities for each parent of the traversal must be computed once for each child. It would be better if we could split this up into two expressions. One to execute once for each new parent:

parent.requiredCapabilities.exists(rc | rc.filter == null || $1 ~= filter(rc.filter))

and another to execute for each child.

rcs.exists(rc | child ~= rc)

Luckily, the p2QL syntax permits something generally referred to as currying. Currying means that you can provide more parameters then the single one that represents the current element to the expression of a collection function. So far, we've seen examples using the syntax:

select(x | <do something with x>)

This is actually a short form for the longer:

select(_, {x | <do something with x>})

The select provides one parameter to each iteration. It's value is always provided reachable using the special operator '_'. In this case, the variable x maps to the parameter _ since they have the same positional offset. I can add more parameters by declaring:

select(a, b, _, {x, y, z, <do something with x, y, z>})
Variable x will now have the value of a, y the value of b, and z the value of _.

So why is this important? Well, the initializers are just evaluated once for one call to select. The expression however, is evaluated once for each new value of _.<p> <p>Let us now return to the traversal again. Beacuse with this syntax, we can actually specify exactly what we wanted. That the required capabilities are filtered once.

$0.traverse(parent | select(
  parent.requiredCapabilities.exists(rc | rc.filter == null || $1 ~= filter(rc.filter)), _,
  {rcs, child | rcs.exists(rc | child ~= rc)}))

Other ways of optimizing

Using a cache to limit processing

We can actually speed things up a bit more. During this traversal, it is not particularly useful to query for matches to the same required capability more then once. Yet, many IU's have many requirements in common. So what if we could specify something globally that would keep track of the required capabilities that has been perused already and discriminate them? Well, we can. All we need to do is to pass a set from a higher scope to a unique function that we apply on the requirements. Like this:

$0.traverse(parent | select(
  parent.requiredCapabilities.exists(rc | rc.filter == null || $1 ~= filter(rc.filter)).unique($2), _,
  {rcs, child | rcs.exists(rc | child ~= rc)}))
The $2 parameter in this case would be an instance of java.util.HashSet.

A drawback with this approach is that it is somewhat challenging to pass to a remote server. A better alternative is to use the set constructor without arguments in combination with currying. Like this:

$0.traverse(set(), _, { uniqueCache, parent | select(
  parent.requiredCapabilities.exists(rc | rc.filter == null || $1 ~= filter(rc.filter)).unique(uniqueCache), _,
  {rcs, child | rcs.exists(rc | child ~= rc)})})

Performance comparison: p2QL traversal versus the p2 Slicer

A test was performed using the query example above on the Galileo repository, using a specific version of the org.eclipse.sdk.feature.group IU as the root. The test produced a result with 411 IU's in 3,362 ms. I compared this with another test that used the p2 Slicer. The number of IU's produced was exaclty the same. The slicer however, took 6,723 ms. Not very scientific perhaps but repeated tests produce similar results.

I'm sure the Slicer can be optimized and achieve results that might be even slightly better then the traversal but I still think this proves a point. p2QL Performance is good although no optimization has been made to the evaluator yet.

Java API

The expression tree created by the parser must be well documented and easy to use so that queries can be created programmatically. Since all expressions are immutable and without context, they can be combined freely. Hence, code like this is fully possible:

// Create some expressions. Note the use of identifiers instead of
// indexes for the parameters

Variable item = new Variable("item");
Expression cmp1 = new Equals(new Member(item, "id"), new KeyedParameter("id"));
Expression cmp2 = new Equals(
   new At(
      new Member(item, "properties"), new KeyedParameter("propKey")),
   new KeyedParameter("propValue"));

Variable everything = new Variable("everything");
LambdaExpression lambda = new LambdaExpression(new And(new Expression[] {cmp1, cmp2}), item);
Expression latest = new Latest(new Select(everything, lambda));
ContextExpression e3 = new ContextExpression(everything, latest);

// Put the parameters in a map
Map args = new HashMap();
args.put("id", "test.bundle");
args.put("propKey", "org.eclipse.equinox.p2.type.group");
args.put("propValue", "true");

// Create the query
IQuery query = new ExpressionQuery(e3, args);

The BNF

condition
	: orExpression ( '?' orExpression ':' orExpression )?
	;

orExpression : andExpression ( '||' andExpression )* ;

andExpression : binaryExpression ( '&&' binaryExpression )* ;

binaryExpression : notExpression ( op notExpression )?;

op : '=' | '!=' | '>' | '>=' | '<' | '<=' | '~=' ;

notExpression
	: '!' notExpression
	| collectionExpression
	;

collectionExpression
	: memberExpression ( '.' collectionFunction )*
	;

memberExpression : constructor ( ( '.' ID ) | ( '[' memberExpression ']' ) )* ;

constructor
	: ( filter | version | range | class ) '(' unaryExpression ')'
	| set '(' ( collectionExpression ( ',' collectionExpression )* )? ')'
	| unaryExpression
	;

collectionFunction
	: ( select | reject | exists | all | traverse ) '(' lambdaDefinition ')'
	| limit '(' memberExpression ')'
	| unique '(' memberExpression? ')'
	| latest '(' lambdaDefinition? ')'
	;

lambdaDefinition
	: initializer ( ',' initializer )* ( ',' '{' lambda '}' )?
	| '{' lambda '}'
	| lambda
	;

initializer
	: '_'
	| condition
	;

lambda
	: ( ID ( ',' ID )* )? '|' condition
	;

unaryExpression
	: '(' condition ')'
	| '[' condition ( ',' condition )* ']' // #array construct
	| '/' regexpPattern '/'
	| STRING
	| INT
	| parameter
	| 'null'
	| 'true'
	| 'false'
	| ID
	;

parameter
	: '$' INT | ID
	;

Back to the top