Skip to main content
Jump to: navigation, search

Difference between revisions of "Equinox/p2/Query Language for p2"

< Equinox‎ | p2
(IQuery examples)
Line 71: Line 71:
 
<li>'''Limit'''<br/>
 
<li>'''Limit'''<br/>
 
It is sometimes desirable to limit the number of rows returned by a query. I added a special keyword 'limit' to cover that.</li>
 
It is sometimes desirable to limit the number of rows returned by a query. I added a special keyword 'limit' to cover that.</li>
<li>'''Type'''<br/>
 
Denotes the type to query for. The expression must either be a string or a class. The type defaults to IInstallableUnit. For compatibility, the type will also ensure that requests for the properties IArtifactQuery.ACCEPT_KEYS and IArtifacteQuery.ACCEPT_DESCRIPTORS yields true when the type is IArtifactKey or IArtifactDescriptor.</li>
 
 
<li>'''Any''' and '''All'''<br/>
 
<li>'''Any''' and '''All'''<br/>
 
<p>These operators can only be used in conjunction with a binary expression. Both operators expects that the operand is some form of collection (array, collection, or iterator). The '''Any''' operator will evaluate the binary expression for each element in the collection until an evaluation yields true. If that happens, the result of the any-evaluation yields true. If no such element is found, the any-evaluation yields false. The '''All''' operator will evaluate the binary expression for each element until an evaluation yields false. If that happens, the all-evaluation yields false. If no such element is found, the all-evaluation yields true.</p>
 
<p>These operators can only be used in conjunction with a binary expression. Both operators expects that the operand is some form of collection (array, collection, or iterator). The '''Any''' operator will evaluate the binary expression for each element in the collection until an evaluation yields true. If that happens, the result of the any-evaluation yields true. If no such element is found, the any-evaluation yields false. The '''All''' operator will evaluate the binary expression for each element until an evaluation yields false. If that happens, the all-evaluation yields false. If no such element is found, the all-evaluation yields true.</p>
Line 161: Line 159:
 
===The [http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form BNF]===
 
===The [http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form BNF]===
 
<pre>
 
<pre>
p2query : type limit? 'latest'? orExpression ;
+
p2query : limit? 'latest'? orExpression ;
 
+
type : 'type' expression ; # specify what type or types (as array) that is accepted by the query
+
  
 
limit : 'limit' INT | parameter ; # limit the number of returned rows
 
limit : 'limit' INT | parameter ; # limit the number of returned rows

Revision as of 10:06, 17 November 2009

Background

As p2 gets more widely used, we predict that repositories will grow larger and larger over time. The Galileo repository alone contains 3866 IU's (in one single service release). Starting with Helios, we plan to keep everything over time which will put the IU count close to 10.000. And that's just one year of Eclipse IP approved material. OSGi is gaining in popularity and p2 is getting increasingly popular. Some companies plan to map the central maven repository as p2. We can soon expect to see p2 repositories with over a 100.000 IU's in them.

The problem

p2 has a query mechanism today that makes it hard to create a repository implementation that is based on a database. It is also diffiult to create an efficient client/server solution. The reason for this is that most of the queries are written as implementations of the IQuery interface, either as ContextQuery derivates that need access to all rows in the database, or as MatchQuery derivates with a java method to which all rows need to be passed.

There is no uniform way to translate these queries into an expression that can be understood by a database or by something running in another process.

Some attempts has been made to rectify this and a repository implementation could potentially make some intelligent decisions based on information derived from looking at the query (what class it is, and in some cases, what properties that are exposed). It is an improvement but the implementation still needs to be very specialized and the general problem remains; A large amount of the queries are still more or less completely opaque and it's enough with one single such query to force a full selection of everything from the database/remote client.

The p2ql expression language discussed here is an attempt to rectify this problem. It can be used in two ways:

  • Coexisting with current IQuery / IQueryable
    Using the ExpressionQuery (a ContextQuery subclass), it can coexist with todays IQuery/IQueryable. The ExpressionQuery is created based on an expression and a list of parameters. The expression can be in two forms. Its either an Expression instance or a String which then gets parsed into an Expression. See section #IQuery examples.
  • With more explicit separation of concerns
    It can also be used in an alternative solution where an IQueryable simply extracts a result based on a predicate expression, and a ICollect instance that deals with further processing of that result (involving collectors, visitors, and the like). More about this in section #An alternative way of doing things

Bugzilla

We discussed the use of a query language at length on the p2 meeting on November 9 and November 16. The issue is captured in bugzilla Create a QueryLanguage for p2.

Considerations

When designing the language I tried to consider two things. Who is the primary user and which are the most common queries?

Java style operators

The primary user is typically well aqquinted with Java so I opted for using java operators such as '&&' and '||' rather then SQL'ish keywords like OR and AND. I also use '.' for member access and '[' ']' to get indexed access. The latter can also be used to get keyed access.

Operators with special meaning

  • Matches operator

    A large amount of queries involve versions, version ranges, and capability matching. So managing that is important. Another thing that is important is to be able to support filtering based on ID's. All of this is now also built in to the language through the matches operator '~='. It can be though of as the 'satisfies' method on an installable unit or provided capability, as 'is included in' when used with a version and version range, as a plain matches when used with a string and a pattern, or as 'instanceof' when comparing with a class.

    The following things can be matched using the operator

    LHS RHS Implemented as
    IProvidedCapability IRequiredCapability lhs.satisfies(rhs)
    IInstallableUnit IRequiredCapability lhs.satisfies(rhs)
    Version VersionRange rhs.isIncluded(lhs)
    IInstallableUnit Filter rhs.matches(lhs.properties)
    Map Filter rhs.match(lhs)
    String Pattern rhs.matcher(lhs).matches()
    <any> Class rhs.isInstance(lhs)
    Class Class rhs.isAssignableFrom(lhs)
  • Latest
    Some queries must make sure that the result only contains the latest version of each found IU. I added a special keyword 'latest' to make that possible.
  • Limit
    It is sometimes desirable to limit the number of rows returned by a query. I added a special keyword 'limit' to cover that.
  • Any and All

    These operators can only be used in conjunction with a binary expression. Both operators expects that the operand is some form of collection (array, collection, or iterator). The Any operator will evaluate the binary expression for each element in the collection until an evaluation yields true. If that happens, the result of the any-evaluation yields true. If no such element is found, the any-evaluation yields false. The All operator will evaluate the binary expression for each element until an evaluation yields false. If that happens, the all-evaluation yields false. If no such element is found, the all-evaluation yields true.

    Any and All can be nested if the operand is a multi-dimensional collection (as is the case with patch applicability for instance).

Combined queries

It must also be possible to combine queries (hence && and ||).

Query parameters

The query must be parameterised so that expression parsing can be done once and then executed multiple times. A parameter is expressed as {n} where n is the parameter index, originating from 0.

Java API

The expression tree created by the parser must be well documented and easy to use so that queries can be created programmatically.

Performance

Both the parser and the evaluator must be very fast. Performance must be comparable with the current solution.

IQuery examples

Think of the query string as something that would be entered after the WHERE keyword in an SQL query, i.e.

SELECT self FROM repository WHERE ...

The query language will only cover the '...' part since the rest is implicit. So here are some examples of query expressions:

Query for all IU's that has an id:

IQuery query = new ExpressionQuery("id = {0}", id);

Query for the latest IU of some specific id:

IQuery query = new ExpressionQuery("latest id = {0}", id);

Query an artifact repository for all keys with a specific classifier:

IQuery query = new ExpressionQuery(IArtifactKey.class, "classifier = {0}", classifier);

Query for the latest IU that matches a specific version range. Since the second parameter is a VersionRange, the ~= operator is interpreted as isIncluded:

IQuery query = new ExpressionQuery("latest id = {0} && version ~= {1}", id, range);

Query for an IU that has a specific property set:

IQuery query = new ExpressionQuery("properties[{0}] = {1}", key, value);

The same query, but this time for multiple possible values:

IQuery query = new ExpressionQuery("properties[{0}] = ANY {1}", key, new Object[] { v1, v2, v3 });

Query for all categories found in the repository:

IQuery query = new ExpressionQuery("properties[{0}] = 'true'", IInstallableUnit.PROP_TYPE_CATEGORY);

Query for all IU's that fulfil at least one of the requirements from another IU. Since the first parameter is a list of IRequiredCapabilites, the ~= applied to each combination of provided / required capability is intepreted as satisfies.

IQuery query = new ExpressionQuery("ANY providedCapabilities ~= ANY {0}", iu.getRequiredCapabilities());
OR
IQuery query = new ExpressionQuery("self ~= ANY {0}", iu.getRequiredCapabilities());

Query for the latest version of all patches:

IQuery query = new ExpressionQuery("latest self ~= {0}", IInstallableUnitPatch.class);

Query for all IU's affected by a patch:

IQuery query = new ExpressionQUery("self ~= ANY ALL {0}", patch.getApplicabilityScope());

I think it would be valuable if users of p2 that have the need for specific queries could list them here. It is not unlikely that some queries might motivate some extension to the current syntax or to how the evaluator works.

An alternative way of doing things

Passing instances of the IQuery to an IQueriable is problematic since it is very difficult for the IQueryable implementer to figure out what the IQuery does. When the implementation is based on a database or a remote server, the query needs to be converted into some kind of generic expression. Some special code would need to be written that would make a best effort. It might be able to recognize some queries and then process others locally. Often with very varying performance. Executing a query locally means passing all data over the net or extracting all tuples from a database. For huge repositories, that will be a very resource intensive operation.

Todays IQuery does have one big advantage though. It can contain java code that does virtually anything to filter the result or even perform actions based on the result. Needless to say, such filtering, or such actions, cannot be performed by a database or a remote server. So clearly, the concept currently encapsulated by the IQuery/IQueryable is needed.

Here is a proposal on an alternative solution that would get the best of both worlds. In essence, each query would start with the evaluation of a query expression. Then, if needed, the query result is further processed by a construct very similar to the current IQuery/IQueryable.

Simplified, the solution would be based on the following two interfaces:

public interface IQueryable {
   Iterator query(String expression, Object[] args, IProgressMonitor monitor);
   Iterator query(Expression expression, Object[] args, IProgressMonitor monitor);
}

public interface ICollector {
   Collector collect(Iterator iterator, Collector collector);
}

The ICollector is basically the same thing as the current IQuery. The key thing here is the separation of concern. An IQueriable can return a result based on a predicate. The ICollector does something with that result.

Today, you would write something like:

MyQuery query = new MyQuery(collectorArgs, queryArgs);
Collector coll = repo.query(query, new Collector(), monitor);

If the query does things that must be done locally and is not covered by the query expression language, this would now be replaced by:

MyCollector collector = new MyCollector(collectorArgs);
Collector coll = collector.collect(repo.query("some expression", queryArgs, monitor), new Collector());

Note though, that any code that just wishes to iterate over the result, would not need the collector at all. It would still be possible to just write:

Iterator itr = repo.query("some expression", queryArgs, monitor);
while(itr.hasNext()) {
   ...
}

and judging from the current usage of the query, that would be the most common way of doing things. The places where special handling by a collector is needed are very rare.

We could also allow a more generic Iterator filter, i.e. an Iterator implementation that uses an inner Iterator. With two basic implementations, ContextIteratorFilter and MatchIteratorFilter (corresponding to the current ContextQuery and MatchQuery) such filters would assist in writing general purpose filters to execute locally.

The BNF

p2query : limit? 'latest'? orExpression ;

limit : 'limit' INT | parameter ; # limit the number of returned rows

orExpression : andExpression ( '||' andExpression )* ;

andExpression : notExpression ( '&&' notExpression )* ;

notExpression
   : '!' notExpression
   | binaryExpression
   ;

op : '=' | '!=' | '>' | '>=' | '<' | '<=' | '~=' ;

binaryExpression : sideExpression op sideExpression ;

sideExpression : ('any' | 'all')* memberExpression ;

memberExpression : unaryExpression ( ( '.' ID ) | ( '[' memberExpression ']' ) )* ;

unaryExpression
   : '(' orExpression ')'
   | '[' memberExpression ( ',' memberExpression )* ']' // #array construct
   | '/' regexpPattern '/'
   | STRING
   | INT
   | parameter
   | 'self'
   | 'null'
   | 'true'
   | 'false'
   | ID // # implies self.identifier
   ;

   parameter
   : '{' INT '}'
   ;

The expression tree model

P2ql.png

Back to the top