Difference between revisions of "CDO/Tweaking Performance"
(→Defining Fetch Rules Dynamically – CDOFetchAnalyzer)
Revision as of 01:09, 19 December 2010
The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).
Speeding up CDO is our constant goal and task. If you have any questions or suggestions, do not hesitate to contact any member of the CDO team.
Setting EMF Parameters
The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses models may make it seem slow. Therefore, here are a few things to consider while defining a model:
- For one-to-many relationships, the Unique property should be set to “false”. Otherwise, add and set operations will fetch all objects in the list.
- If it is absolutely necessary to define the Unique property to be “true”, containment or a bidirectional relation many-to-one should at least be set. That way, EMF will be able (starting from version 2.5) to accelerate insertion by looking up its inverse reference (eContainer or opposite reference) instead of crawling the list.
- The Resolve Proxies property should be set to “false” as well in one-to-many relationships. Otherwise, in some cases, performance could happen to decrease. The internal structure of CDO never creates EMF proxies even when it references external data in a non-CDO resource. CDO will load them when the list is being accessed.
- In any case, both properties (Unique and Resolve Proxies) should rarely be used at the same time, especially without an opposite single reference
By doing these simple things, CDO users can get a twentyfold performance improvement in their application. It is worth being tried: adding 10,000 elements in a list, with and without those changes, to see the difference.
Loading Partial Collections – CDOCollectionLoadingPolicy
The CDOCollectionLoadingPolicy feature of the CDOSession controls how a list gets populated.
By default, when an object is fetched, all its fields are filled with the proper values. See Figure 1.
This could be time-consuming, especially if the ref1 reference does not need to be accessed.
In CDO it is possible to fetch collections partially. The CDOCollectionLoadingPolicy feature defines how a list will be loaded.
The implementation that is shipped with CDO makes a distinction between the two following situations:
- How many CDOIDs to fill when an object is loaded for the first time;
- Which elements to fill with CDOIDs when the accessed element is not yet filled.
CDOUtil.createCollectionLoadingPolicy (initialChunkSize, numberOfIndexToResolve);
Example: Let's suppose that the implementation is defined as follows:
CollectionLoadingPolicy policy = CDOUtil.createCollectionLoadingPolicy(10, 20); session.options().setCollectionLoadingPolicy(policy);
When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:
As soon as any element beyond the tenth element gets accessed in the list, CDO asks the CDOCollectionLoadingPolicy feature to fill more elements. The example policy would load twenty more CDOIDs into the list.
Also, if the list is accessed by index, it does not need to fetch items from the beginning of the index, only that defined by the CDOCollectionLoadingPolicy feature.
Based on some tests, good performance can be achieved by using the following settings:
session.options().setCollectionLoadingPolicy (CDOUtil.createLoadCollectionPolicy(0, 300));
The code line above means that no CDOIDs should be fetched into the reference lists until the lists are actually accessed.
The end-user could provide its own implementation of the CDOCollectionLoadingPolicy interface.
Prefetching Target Objects – CDORevisionPrefetchingPolicy
The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.
The difference between the CDOCollectionLoadingPolicy feature and the CDORevisionPrefetchingPolicy feature is subtle. The CDOCollectionLoadingPolicy feature determines how and when to fetch CDOIDs, while the CDORevisionPrefetchingPolicy feature determines how and when to resolve CDOIDs (i.e. fetch the target objects).
What happens when list items are being accessed? The list fetches objects one at a time.
As an example, here is what happens while iterating through the ref1 list:
- oid3 is not in the cache, load oid3
- oid4 is not in the cache, load oid4
- oid5 is not in the cache, load oid5
Steps 2, 4 and 6 are the slowest operations. Since oid3 is not in the cache, it will be fetched from the server. Every object will be fetched sequentially.
Why not be smarter? Why not load more objects at a time? This would reduce the number of client-server round trips. When oid3 is being loaded, oid4 and oid5 could be loaded at the same time.
- oid3 is not in the cache, load oid3, oid4, oid5
- oid4 is in the cache
- oid5 is in the cache
Instead of three, only one call will be made to the server. How many calls would be safe for a list containing 100 or 10,000 items?
This feature uses CDOView.setRevisionPrefetchingPolicy. For example:
The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.
Defining Fetch Rules Dynamically – CDOFetchAnalyzer
In many applications, hard coded rules are used to determine what to fetch. This is mainly to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...
The CDOFetchAnalyzer feature can be used to define rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.
Examples will be available soon. (Contributions welcome!)
Caching in CDO
There are three important places in CDO where caches are used:
- CDOView maintains a cache of CDOObjects (client side). This cache is always a memory sensitive cache which is not configurable.
- CDOSession (through CDORevisionManager) maintains a cache of CDORevisions (client side). This cache implements CDORevisionCache which is described here.
- IRepository (through IRevisionManager) maintains a cache of CDORevisions (server side). This cache implements CDORevisionCache which is described here.
Tweaking the CDORevisionCaches
- The revision resolver in a client session is a CDORevisionManager. If a requested revision does not exist in its CDORevisionCache the CDORevisionManager loads this revison from the repository (possibly going over the network) and puts it into the cache.
- The revision resolver in a server repository is a IRevisionManager. If a requested revision does not exist in its CDORevisionCache the IRevisionManager loads this revison from the persistent back-end store (possibly going over another network) and puts it into the cache.
All caching aspects (except the cache miss handling mentioned above) are handled uniformly in the common base type of the two revision managers (CDORevisionResolver):
public CDORevisionCache getCache(); public void setCache(CDORevisionCache cache);
If the setter is called to configure the instance of CDORevisionCache to be used by the manager it must happen before the manager is activated (the revision managers are automatically activated when their CDOSession/IRepository is activated). If the setter has not been called before the activation of the manager a default cache is created and configured (see below).
As of this writing CDO ships with three different CDORevisionCache implementations:
- LRURevisionCache is a fixed size cache with a least recently used (LRU) eviction policy. An LRURevisionCache maintains two separate LRU lists, one for current revisions (i.e. those with revised == CDORevision_UNSPECIFIED_TIME) and one for revised revisions (i.e. those with revised != CDORevision_UNSPECIFIED_TIME). The capacity of the two fixed size LRU lists can be configured separately. To create an LRURevisionCache call CDORevisionCacheUtil.createLRUCache(int capacityCurrent, int capacityRevised).
- MEMRevisionCache is a memory sensitive cache without any special eviction policy (as this is not possible with memory sensitive caching in general). This type of cache can not be configured. To create a MEMRevisionCache call CDORevisionCacheUtil.createMEMCache().
- TwoRevisionCache is a delegating cache with two delegation levels. You can set each level independently thereby combining the behaviours of other cache types in a predictable order. Revisions dropped from the first level cache are saved to the second level cache automatically. Cache lookup always delegates to the first level cache and only in case of a miss there it delegates to the second level cache. To create a TwoRevisionCache call CDORevisionCacheUtil.createTwoLevelCache(CDORevisionCache level1, CDORevisionCache level2).
Of course you can also write your own CDORevisionCache implementation and use it at client and/or server side.
The default cache (in the case no cache has been explicitely set before revision manager activation) is a TwoRevisionCache with an LRURevisionCache as the first level and a MEMRevisionCache as the second level. The default capacities of the two LRU lists of the fixed size cache (first level) are declared in CDORevisionCacheUtil:
public static final int DEFAULT_CAPACITY_CURRENT = 1000; public static final int DEFAULT_CAPACITY_REVISED = 1000;
Now you have an impression about the interwork of CDORevisionResolver and CDORevisionCache as well as the different types of caches and their configuration. As mentioned earlier it is important to set/configure your caches before your CDOSession or IRepository is activated. There are many different ways to create/wire/configure these instances. Some of them are explained subsequently.
At client side you can programmatically open a CDOSession through an instance of CDOSessionConfiguration:
CDORevisionCache revisionCache = CDORevisionCacheUtil.createTwoLevelCache( CDORevisionCacheUtil.createLRUCache(100000, 100), CDORevisionCacheUtil.createMEMCache()); CDOSessionConfiguration configuration = CDOUtil.createSessionConfiguration(); configuration.setConnector(connector); configuration.setRepositoryName("MyRepo"); configuration.setRevisionCache(revisionCache); CDOSession session = configuration.openSession();
Currently you can not change the cache type used in an IRepository that is created through the XML configuration in a cdo-server.xml file. It is always a default cache (see above) but the capacity of the two fixed size LRU lists can be configured separately:
<?xml version="1.0" encoding="UTF-8"?> <cdoServer> <repository name="MyRepo"> <property name="currentLRUCapacity" value="100000"/> <property name="revisedLRUCapacity" value="100"/> ... </repository> </cdoServer>