EmfIndex

EMF Index is a component in the Modeling / EMFT project. Have a look at the project proposal for details.

Currently, it is work in progress. Some initial code is hosted by the Xtext project, but that is quite rough and very likely to change drastically. There are team project sets for pserver and extssh access. It is planned to start contributing after the Galileo release.

Collected thoughts about the current architecture

Scopes

This section is about search scopes, not the scopes of the in-memory index implementation, which should rather be called caches or indices.

A scope is a defined subset of the whole set of indexed data. By specifying a scope, the search space of a query can be significantly reduced. A natural choice for scopes in the context of EMF would all the elements extracted from one resource.

Questions:

Do we need more explicit scopes in EMF Index?
Do we need user definable scopes, or are we OK with predefined scopes, such as resource scopes?
Do we have to provide set operations for scopes?
How do scopes map to database implementations of the EMF Index?
SAP guys have also proposed an EClass scope. Do we need that?
How grave is the memory penalty of a new scope?

In-memory implementation

EMF Index currently comes with a default in-memory implementation for the index storage. SAP has offered to implement a resource scoped, pageable implementation, as they have experienced out-of-memory errors with the current serialization implementation.

Questions:

Do we need a URI -> ResourceDescriptor cache?
The current impl uses HashSets to store the descriptors. Should the implementation rather be implemented using sorted linked list or other more memory/search efficient storage classes?
The Query API returns Iterables, allowing the actual search operation to be executed lazily. Nevertheless, the default implementation assembles lists eagerly. Should we change the implementation to search lazily as the API suggests? How do we deal with concurrent modifications? Should we clone the search scopes as Sven suggests?
Currently, the descriptors are implemented using plain Java classes. Given the problems we're currently facing (e.g. serialization, paging, bi-directional cross references) I start asking myself if we'd rather use an Ecore based implementation, at least for the in-memory index. EMF offers good solutions, like proxy resolution, binary serialization, minimal EObject, eOpposites, which could be leveraged.

Query API

The current query API is limited to AND concatenated criteria. The fluent API also returns different types, making the construction of a query somehow tedious.

Questions

Do we need other logical operations than AND?
Should we drop the fluent API and replace it by a QueryBuilder class?

EReferenceDescriptor

SAP guys have proposed to introduce an EReference descriptor (see this bug).

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

EmfIndex

Contents

Collected thoughts about the current architecture

Scopes

In-memory implementation

Query API

EReferenceDescriptor

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

EmfIndex

Contents

Collected thoughts about the current architecture

Scopes

In-memory implementation

Query API

EReferenceDescriptor