Jump to: navigation, search


Revision as of 14:58, 5 August 2009 by Jos.warmer.ordina.nl (Talk | contribs) (Index Structure)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

EMF Index is a component in the Modeling / EMFT project. Have a look at the project proposal for details.

Currently, it is work in progress. Some initial code is hosted by the Xtext project, but that is quite rough and very likely to change drastically. There are team project sets for pserver and extssh access. An alternative implementation attempt is currently lead by SAP.

The mailing list informs you about the latest news in the project. Additionally, we are having delevopers' telephone calls each Wednesday at 11 a.m. CET. If you want to join, please drop an email to the mailing list. You'll find the meeting minutes here.

We're currently gathering requirements to have a better basis for our architectural decision. Feel free to add your's. Please don't forget to put your name on your propositions.

Collected thoughts about the current architecture

Index Structure

We need to be flexible in the to limit the information that goes into the Index.  Therefore it should be possible to put user data into EMF Index.  In CrossX we use an EMF metamodel to define the structure of the CrossX index.  Below is a model that is adjusted for EMF Index. Note that the metamodel is not complete, many of the things in t he current EMF Index (like a reference to the originating modelelement, etc) need to be there as well.

EMF Index metamodel


Each Symbol has a logical name, this is a name that the modeler types into an editor when creating a model.  Or it could be used in a code completion proposal. A Symbol also has a type, which is usually the name of the metaclass of the modelelement where the symbokl came from.  Nota that this type can also be defined a a EmfIndexProperty, we have put it directly into Symbol, becauyse we use it all the time.

Symbols may have subsymbols, that is all symbols ibn the index are hierarchically strcutured.  This is somethin that is usefull for looking up symbols.  E.g. when I have a Dom,ainClass with DomainAttributes in my model, this is a hierarchical structure.  In this case we put an identical hierarchical structure into the index.The way we query the index is e.g. as follows:

  • LOOKUP classname::attributename. This will search for a Symbol with name classname and then for a subsymbol with name attributename
  • LOOKUP class Symbol, then and request all subsymbols of this class symbol with symbol.getSubSymbols(). This then is a navigation within the Index. This is typically done to find elements that we need for code completion.

A Symbol usually comes directly from a modelelement, but that is optional.

User data

Each symbol may have properties, which are freely added during indexing. Sometimes these represent properties of the originating modelelelent, sometyime aggregaated ionformation about the modelelent. These properties are called User Data in the current EMF Index and they are crucial to the way we use the index. We try to never open other models, so we want to put all information abotu a model that might be neeede elswhere into the index.


This section is about search scopes, not the scopes of the in-memory index implementation, which should rather be called caches or indices.

A scope is a defined subset of the whole set of indexed data. By specifying a scope, the search space of a query can be significantly reduced. A natural choice for scopes in the context of EMF would all the elements extracted from one resource.


  • Do we need more explicit scopes in EMF Index?
  • Do we need user definable scopes, or are we OK with predefined scopes, such as resource scopes?
  • Do we have to provide set operations for scopes?
  • How do scopes map to database implementations of the EMF Index?
  • SAP guys have also proposed an EClass scope. Do we need that?
  • How grave is the memory penalty of a new scope?

In-memory implementation

EMF Index currently comes with a default in-memory implementation for the index storage. SAP has offered to implement a resource scoped, pageable implementation, as they have experienced out-of-memory errors with the current serialization implementation.


  • Do we need a URI -> ResourceDescriptor cache?
  • The current impl uses HashSets to store the descriptors. Should the implementation rather be implemented using sorted linked list or other more memory/search efficient storage classes?
  • The Query API returns Iterables, allowing the actual search operation to be executed lazily. Nevertheless, the default implementation assembles lists eagerly. Should we change the implementation to search lazily as the API suggests? How do we deal with concurrent modifications? Should we clone the search scopes as Sven suggests?
  • Currently, the descriptors are implemented using plain Java classes. Given the problems we're currently facing (e.g. serialization, paging, bi-directional cross references) I start asking myself if we'd rather use an Ecore based implementation, at least for the in-memory index. EMF offers good solutions, like proxy resolution, binary serialization, minimal EObject, eOpposites, which could be leveraged.

Query API

The current query API is limited to AND concatenated criteria. The fluent API also returns different types, making the construction of a query somehow tedious.


  • Do we need other logical operations than AND?
  • Should we drop the fluent API and replace it by a QueryBuilder class?


SAP guys have proposed to introduce an EReference descriptor (see this bug).