The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).
Speeding up CDO is our constant goal and task. If you have any questions or suggestions, do not hesitate to contact any member of the CDO team.
Setting EMF Parameters
The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses models may make it seem slow. Therefore, here are a few things to consider while defining a model:
- For one-to-many relationships, the Unique property should be set to “false”. Otherwise, add and set operations will fetch all objects in the list.
- If it is absolutely necessary to define the Unique property to be “true”, containment or a bidirectional relation many-to-one should at least be set. That way, EMF will be able (starting from version 2.5) to accelerate insertion by looking up its inverse reference (eContainer or opposite reference) instead of crawling the list.
- The Resolve Proxies property should be set to “false” as well in one-to-many relationships. Otherwise, in some cases, performance could happen to decrease. The internal structure of CDO never creates EMF proxies even when it references external data in a non-CDO resource. CDO will load them when the list is being accessed.
- In any case, both properties (Unique and Resolve Proxies) should rarely be used at the same time, especially without an opposite single reference
By doing these simple things, CDO users can get a twentyfold performance improvement in their application. It is worth being tried: adding 10,000 elements in a list, with and without those changes, to see the difference.
Caching in CDO
There are three important places in CDO where caches are used:
- CDOView maintains a cache of CDOObjects (client side). This cache is always a memory sensitive cache which is not configurable.
- CDOSession (through CDORevisionManager) maintains a cache of CDORevisions (client side). This cache implements CDORevisionCache.
- IRepository (through CDORevisionManager) maintains a cache of CDORevisions (server side). This cache implements CDORevisionCache.
Loading Partial Collections – CDOCollectionLoadingPolicy
The CDOCollectionLoadingPolicy feature of the CDOSession controls how a list gets populated.
By default, when an object is fetched, all its fields are filled with the proper values. See Figure 1.
This could be time-consuming, especially if the ref1 reference does not need to be accessed.
In CDO it is possible to fetch collections partially. The CDOCollectionLoadingPolicy feature defines how a list will be loaded.
The implementation that is shipped with CDO makes a distinction between the two following situations:
- How many CDOIDs to fill when an object is loaded for the first time;
- Which elements to fill with CDOIDs when the accessed element is not yet filled.
CDOUtil.createCollectionLoadingPolicy (initialChunkSize, numberOfIndexToResolve);
Example: Let's suppose that the implementation is defined as follows:
CollectionLoadingPolicy policy = CDOUtil.createCollectionLoadingPolicy(10, 20); session.options().setCollectionLoadingPolicy(policy);
When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:
As soon as any element beyond the tenth element gets accessed in the list, CDO asks the CDOCollectionLoadingPolicy feature to fill more elements. The example policy would load twenty more CDOIDs into the list.
Also, if the list is accessed by index, it does not need to fetch items from the beginning of the index, only that defined by the CDOCollectionLoadingPolicy feature.
Based on some tests, good performance can be achieved by using the following settings:
session.options().setCollectionLoadingPolicy (CDOUtil.createCollectionLoadingPolicy(0, 300));
The code line above means that no CDOIDs should be fetched into the reference lists until the lists are actually accessed.
The end-user could provide its own implementation of the CDOCollectionLoadingPolicy interface.
Prefetching Target Objects Automatically – CDORevisionPrefetchingPolicy
The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.
The difference between the CDOCollectionLoadingPolicy feature and the CDORevisionPrefetchingPolicy feature is subtle. The CDOCollectionLoadingPolicy feature determines how and when to fetch CDOIDs, while the CDORevisionPrefetchingPolicy feature determines how and when to resolve CDOIDs (i.e. fetch the target objects).
What happens when list items are being accessed? The list fetches objects one at a time.
As an example, here is what happens while iterating through the ref1 list:
- oid3 is not in the cache, load oid3
- oid4 is not in the cache, load oid4
- oid5 is not in the cache, load oid5
Steps 2, 4 and 6 are the slowest operations. Since oid3 is not in the cache, it will be fetched from the server. Every object will be fetched sequentially.
Why not be smarter? Why not load more objects at a time? This would reduce the number of client-server round trips. When oid3 is being loaded, oid4 and oid5 could be loaded at the same time.
- oid3 is not in the cache, load oid3, oid4, oid5
- oid4 is in the cache
- oid5 is in the cache
Instead of three, only one call will be made to the server. How many calls would be safe for a list containing 100 or 10,000 items?
This feature uses CDOView.setRevisionPrefetchingPolicy. For example:
The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.
Prefetching Nested Objects Explicitely – cdoPrefetch()
As of CDO 3.0 the CDOObject interface supports prefetching of (the revisions for) nested objects, e.g.:
Defining Fetch Rules Dynamically – CDOFetchAnalyzer
In many applications, hard coded rules are used to determine what to fetch. This is mainly to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...
The CDOFetchAnalyzer feature can be used to define rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.
Examples will be available soon. (Contributions welcome!)