CDO/Tweaking Performance

From Eclipsepedia

Jump to: navigation, search

The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).

Speeding up CDO is our constant goal and task. If you have any questions or suggestions, do not hesitate to contact any member of the CDO team.


Using EMF

Defining a Model

The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses EMF models may make it seem slow under certain circumstances. Therefore, here are a few things to consider while defining a model:

  • The uniqueness checks that are inherent to many operations on many-valued features with unique==true can be very expensive. In general they increase the effort for inserting 'n' elements into a list to O('n' * 'n'). For a detailed discussion see Bugzilla 408197. Please note that:
    • All EObject lists are implicitely treated as unique. I don't know exactly why EMF doesn't support non-unique, uni-directional EObject lists.
    • The effort of the uniqueness check of a bidirectional reference with a single opposite is O(1), resulting in an insertion effort of O('n'). Many-valued containment references are a prominent example where this EMF optimization applies.
    • If you know 'a priori' that your algorithm does never violate a uniqueness constraint (for example because you 'copy' from a valid object) you can safely cast your list to 'InternalEList' and call methods such as addUnique() or addAllUnique() to avoid redundant uniqueness checks.
  • The Resolve Proxies property should be set to “false” as well in one-to-many relationships. Otherwise, in some cases, performance could happen to decrease. The internal structure of CDO never creates EMF proxies even when it references external data in a non-CDO resource. CDO will load them when the list is being accessed.
  • In any case, both properties (Unique and Resolve Proxies) should rarely be used at the same time, especially without an opposite single reference

By doing these simple things, CDO users can achieve a twentyfold performance improvement in their application. It is worth trying: add 10,000 elements to a list, with and without those changes. See the difference!

Batch processing existing meta models (using xquery)

If you have a xquery processor installed (basex (BSD Licence) is a good choice), you can try the following xquery script to batch process your existing ecore files to apply the above performance hints (only resolveProxies and unique and references with upper bound set to -1 will be checked).

let $dir := "path/to/your/ecore/files/" (: e.g.: "c:/workspace/com.example.model/model/", don't forget the ending '/':)
for $file in file:list($dir, false(), "*.ecore")
let $fullPath := concat($dir, $file)
  copy $c := doc($fullPath)
  modify (
    (: this part will set 'Resolve Proxies' to false :) 
    for $r in $c//eStructuralFeatures
    where $r[@upperBound = -1 and @*:type = "ecore:EReference"]
    return (
      delete node $r/@resolveProxies,
      insert node attribute {'resolveProxies'}{'false'} into $r
    (: this part will set 'Unique' to false :)
    for $r in $c//eStructuralFeatures
    where $r[@upperBound = -1 and @*:type = "ecore:EReference" and not(@containment = 'true') and empty(@eOpposite)]
    return (
      delete node $r/@unique,
      insert node attribute {'unique'}{'false'} into $r
    fn:put($c, $fullPath)
  return $fullPath

Attention: The script will overwrite your ecore files, i.e. any indentations and/or formatting will be lost (if you care about that).

Caching in CDO

There are three important places in CDO where caches are used:

  • CDOView maintains a cache of CDOObjects (client side). This cache is always a memory sensitive cache which is not configurable.
  • CDOSession (through CDORevisionManager) maintains a cache of CDORevisions (client side). This cache implements CDORevisionCache.
  • IRepository (through CDORevisionManager) maintains a cache of CDORevisions (server side). This cache implements CDORevisionCache.

Loading Partial Collections – CDOCollectionLoadingPolicy

The CDOCollectionLoadingPolicy feature of the CDOSession controls how a list gets populated.

By default, when an object is fetched, all its fields are filled with the proper values. See Figure 1.

Tweaking CDO Performance Figure1.jpg

This could be time-consuming, especially if the ref1 reference does not need to be accessed.

In CDO it is possible to fetch collections partially. The CDOCollectionLoadingPolicy feature defines how a list will be loaded.

The implementation that is shipped with CDO makes a distinction between the two following situations:

  • How many CDOIDs to fill when an object is loaded for the first time;
  • Which elements to fill with CDOIDs when the accessed element is not yet filled.
        CDOUtil.createCollectionLoadingPolicy (initialChunkSize, numberOfIndexToResolve);

Example: Let's suppose that the implementation is defined as follows:

CollectionLoadingPolicy policy = CDOUtil.createCollectionLoadingPolicy(10, 20);

When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:

Tweaking CDO Performance Figure2.jpg

As soon as any element beyond the tenth element gets accessed in the list, CDO asks the CDOCollectionLoadingPolicy feature to fill more elements. The example policy would load twenty more CDOIDs into the list.

Also, if the list is accessed by index, it does not need to fetch items from the beginning of the index, only that defined by the CDOCollectionLoadingPolicy feature.

Based on some tests, good performance can be achieved by using the following settings:

  session.options().setCollectionLoadingPolicy (CDOUtil.createCollectionLoadingPolicy(0, 300));

The code line above means that no CDOIDs should be fetched into the reference lists until the lists are actually accessed.

The end-user could provide its own implementation of the CDOCollectionLoadingPolicy interface.

Prefetching Target Objects Automatically – CDORevisionPrefetchingPolicy

The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.

The difference between the CDOCollectionLoadingPolicy feature and the CDORevisionPrefetchingPolicy feature is subtle. The CDOCollectionLoadingPolicy feature determines how and when to fetch CDOIDs, while the CDORevisionPrefetchingPolicy feature determines how and when to resolve CDOIDs (i.e. fetch the target objects).

What happens when list items are being accessed? The list fetches objects one at a time.

As an example, here is what happens while iterating through the ref1 list:

Tweaking CDO Performance Figure3.jpg

  2. oid3 is not in the cache, load oid3
  4. oid4 is not in the cache, load oid4
  6. oid5 is not in the cache, load oid5

Steps 2, 4 and 6 are the slowest operations. Since oid3 is not in the cache, it will be fetched from the server. Every object will be fetched sequentially.
Why not be smarter? Why not load more objects at a time? This would reduce the number of client-server round trips. When oid3 is being loaded, oid4 and oid5 could be loaded at the same time.

  2. oid3 is not in the cache, load oid3, oid4, oid5
  4. oid4 is in the cache
  6. oid5 is in the cache

Instead of three, only one call will be made to the server. How many calls would be safe for a list containing 100 or 10,000 items?
This feature uses CDOView.setRevisionPrefetchingPolicy. For example:

view.options().setRevisionPrefetchingPolicy (CDOUtil.createRevisionPrefetchingPolicy(10));

The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.

Prefetching Nested Objects Explicitely – cdoPrefetch()

As of CDO 3.0 the CDOObject interface supports prefetching of (the revisions for) nested objects, e.g.:


Defining Fetch Rules Dynamically – CDOFetchAnalyzer

In many applications, hard coded rules are used to determine what to fetch. This is mainly to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...

The CDOFetchAnalyzer feature can be used to define rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.

Examples will be available soon. (Contributions welcome!)

Wikis: CDO | Net4j | EMF | Eclipse