Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "CDO/Tweaking Performance"

< CDO
(Defining Fetch Rules Dynamically – CDOFetchAnalyzer)
(Batch processing existing meta models)
(11 intermediate revisions by 4 users not shown)
Line 23: Line 23:
  
 
<br>
 
<br>
 +
===Batch processing existing meta models (using xquery)===
 +
If you have a xquery processor installed ([http://basex.org, basex] (BSD Licence) is a good choice), you can try the following xquery script to batch process your existing ecore files to apply the above performance hints (only resolveProxies and unique and references with upper bound set to -1 will be checked).<br>
 +
<pre>
 +
let $dir := "path/to/your/ecore/files/" (: e.g.: "c:/workspace/com.example.model/model/", don't forget the ending '/':)
 +
for $file in file:list($dir, false(), "*.ecore")
 +
let $fullPath := concat($dir, $file)
 +
return
 +
  copy $c := doc($fullPath)
 +
  modify (
 +
    (: this part will set 'Resolve Proxies' to false :)
 +
    for $r in $c//eStructuralFeatures
 +
    where $r[@upperBound = -1 and @*:type = "ecore:EReference"]
 +
    return (
 +
      delete node $r/@resolveProxies,
 +
      insert node attribute {'resolveProxies'}{'false'} into $r
 +
    ),
 +
    (: this part will set 'Unique' to false :)
 +
    for $r in $c//eStructuralFeatures
 +
    where $r[@upperBound = -1 and @*:type = "ecore:EReference" and not(@containment = 'true') and empty(@eOpposite)]
 +
    return (
 +
      delete node $r/@unique,
 +
      insert node attribute {'unique'}{'false'} into $r
 +
    ),
 +
    fn:put($c, $fullPath)
 +
  )
 +
  return $fullPath
 +
</pre>
 +
<br>
 +
Attention: The script will overwrite your ecore files, i.e. any indentations and/or formatting will be lost (if you care about that).
  
==Loading Partial Collections – CDOCollectionLoadingPolicy ==
+
==Caching in CDO==  
  
 +
There are three important places in CDO where caches are used:
 +
* '''CDOView''' maintains a cache of '''CDOObjects''' (client side). This cache is always a memory sensitive cache which is not configurable.
 +
* '''CDOSession''' (through CDORevisionManager) maintains a cache of '''CDORevisions''' (client side). This cache implements CDORevisionCache.
 +
* '''IRepository''' (through CDORevisionManager) maintains a cache of '''CDORevisions''' (server side). This cache implements CDORevisionCache.
 +
 +
==Loading Partial Collections – CDOCollectionLoadingPolicy ==
  
 
The CDOCollectionLoadingPolicy feature of the CDOSession controls how a list gets populated.  
 
The CDOCollectionLoadingPolicy feature of the CDOSession controls how a list gets populated.  
Line 59: Line 94:
 
Based on some tests, good performance can be achieved by using the following settings:
 
Based on some tests, good performance can be achieved by using the following settings:
 
<source lang="java">
 
<source lang="java">
   session.options().setCollectionLoadingPolicy (CDOUtil.createLoadCollectionPolicy(0, 300));
+
   session.options().setCollectionLoadingPolicy (CDOUtil.createCollectionLoadingPolicy(0, 300));
 
</source>
 
</source>
 
The code line above means that no CDOIDs should be fetched into the reference lists until the lists are actually accessed.
 
The code line above means that no CDOIDs should be fetched into the reference lists until the lists are actually accessed.
Line 65: Line 100:
 
The end-user could provide its own implementation of the CDOCollectionLoadingPolicy interface.
 
The end-user could provide its own implementation of the CDOCollectionLoadingPolicy interface.
  
== Prefetching Target Objects – CDORevisionPrefetchingPolicy ==
+
== Prefetching Target Objects Automatically – CDORevisionPrefetchingPolicy ==
  
 
The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.
 
The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.
Line 111: Line 146:
 
The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.
 
The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.
 
<br>
 
<br>
 +
 +
== Prefetching Nested Objects Explicitely – cdoPrefetch() ==
 +
 +
As of CDO 3.0 the CDOObject interface supports prefetching of (the revisions for) nested objects, e.g.:
 +
 +
  object.cdoPrefetch(CDORevision.DEPTH.INFINITE);
 +
 
== Defining Fetch Rules Dynamically – CDOFetchAnalyzer ==
 
== Defining Fetch Rules Dynamically – CDOFetchAnalyzer ==
  
Line 119: Line 161:
 
Examples will be available soon. (Contributions welcome!)
 
Examples will be available soon. (Contributions welcome!)
 
<br>
 
<br>
 
==Caching in CDO==
 
 
There are three important places in CDO where caches are used:
 
* '''CDOView''' maintains a cache of '''CDOObjects''' (client side). This cache is always a memory sensitive cache which is not configurable.
 
* '''CDOSession''' (through CDORevisionManager) maintains a cache of '''CDORevisions''' (client side). This cache implements CDORevisionCache which is described [[#Tweaking the CDORevisionCaches|here]].
 
* '''IRepository''' (through IRevisionManager) maintains a cache of '''CDORevisions''' (server side). This cache implements CDORevisionCache which is described [[#Tweaking the CDORevisionCaches|here]].
 
 
===Tweaking the CDORevisionCaches===
 
 
A [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/common/revision/cache/CDORevisionCache.java?root=Modeling_Project&view=co CDORevisionCache] is used by a [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/common/revision/CDORevisionResolver.java?root=Modeling_Project&view=co CDORevisionResolver]. Revision resolvers exist in client sessions and server repositories and mostly differ in the way they react to CDORevisionCache ''misses'':
 
 
* The revision resolver in a client session is a [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo/src/org/eclipse/emf/cdo/CDORevisionManager.java?root=Modeling_Project&view=co CDORevisionManager]. If a requested revision does not exist in its CDORevisionCache the CDORevisionManager loads this revison from the repository (possibly going over the network) and puts it into the cache.
 
* The revision resolver in a server repository is a [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.server/src/org/eclipse/emf/cdo/server/IRevisionManager.java?root=Modeling_Project&view=co IRevisionManager]. If a requested revision does not exist in its CDORevisionCache the IRevisionManager loads this revison from the persistent back-end store (possibly going over another network) and puts it into the cache.
 
 
All caching aspects (except the cache miss handling mentioned above) are handled uniformly in the common base type of the two revision managers ([http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/common/revision/CDORevisionResolver.java?root=Modeling_Project&view=co CDORevisionResolver]):
 
<source lang="Java">
 
public CDORevisionCache getCache();
 
public void setCache(CDORevisionCache cache);
 
</source>
 
 
If the setter is called to configure the instance of CDORevisionCache to be used by the manager it must happen '''before''' the manager is activated (the revision managers are automatically activated when their CDOSession/IRepository is activated). If the setter has not been called before the activation of the manager a '''default''' cache is created and configured (see below).
 
 
As of this writing CDO ships with three different CDORevisionCache implementations:
 
 
* [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/internal/common/revision/cache/lru/LRURevisionCache.java?root=Modeling_Project&view=co LRURevisionCache] is a '''fixed size''' cache with a ''least recently used'' (LRU) eviction policy. An LRURevisionCache maintains two separate LRU lists, one for '''current revisions''' (i.e. those with revised == CDORevision_UNSPECIFIED_TIME) and one for '''revised revisions''' (i.e. those with revised != CDORevision_UNSPECIFIED_TIME). The capacity of the two fixed size LRU lists can be configured separately. To create an LRURevisionCache call <tt>CDORevisionCacheUtil.createLRUCache(int capacityCurrent, int capacityRevised)</tt>.
 
 
* [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/internal/common/revision/cache/mem/MEMRevisionCache.java?root=Modeling_Project&view=co MEMRevisionCache] is a '''memory sensitive''' cache without any special eviction policy (as this is not possible with memory sensitive caching in general). This type of cache can not be configured. To create a MEMRevisionCache call <tt>CDORevisionCacheUtil.createMEMCache()</tt>.
 
 
* [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/internal/common/revision/cache/two/TwoRevisionCache.java?root=Modeling_Project&view=co TwoRevisionCache] is a '''delegating''' cache with two delegation levels. You can set each level independently thereby combining the behaviours of other cache types in a predictable order. Revisions dropped from the first level cache are saved to the second level cache automatically. Cache lookup always delegates to the first level cache and only in case of a miss there it delegates to the second level cache. To create a TwoRevisionCache call <tt>CDORevisionCacheUtil.createTwoLevelCache(CDORevisionCache level1, CDORevisionCache level2)</tt>.
 
 
Of course you can also write your own CDORevisionCache implementation and use it at client and/or server side.
 
 
The default cache (in the case no cache has been explicitely set '''before''' revision manager activation) is a TwoRevisionCache with an LRURevisionCache as the first level and a MEMRevisionCache as the second level. The default capacities of the two LRU lists of the fixed size cache (first level) are declared in [http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.emf/org.eclipse.emf.cdo/plugins/org.eclipse.emf.cdo.common/src/org/eclipse/emf/cdo/common/revision/cache/CDORevisionCacheUtil.java?root=Modeling_Project&view=co CDORevisionCacheUtil]:
 
<source lang="Java">
 
public static final int DEFAULT_CAPACITY_CURRENT = 1000;
 
public static final int DEFAULT_CAPACITY_REVISED = 1000;
 
</source>
 
 
Now you have an impression about the interwork of CDORevisionResolver and CDORevisionCache as well as the different types of caches and their configuration. As mentioned earlier it is important to set/configure your caches '''before''' your CDOSession or IRepository is activated. There are many different ways to create/wire/configure these instances. Some of them are explained subsequently.
 
 
===CDOSessionConfiguration===
 
At client side you can programmatically open a CDOSession through an instance of CDOSessionConfiguration:
 
<source lang="Java">
 
CDORevisionCache revisionCache = CDORevisionCacheUtil.createTwoLevelCache(
 
  CDORevisionCacheUtil.createLRUCache(100000, 100),
 
  CDORevisionCacheUtil.createMEMCache());
 
 
CDOSessionConfiguration configuration = CDOUtil.createSessionConfiguration();
 
configuration.setConnector(connector);
 
configuration.setRepositoryName("MyRepo");
 
configuration.setRevisionCache(revisionCache);
 
 
CDOSession session = configuration.openSession();
 
</source>
 
 
===cdo-server.xml===
 
 
Currently you can not change the cache '''type''' used in an IRepository that is created through the XML configuration in a cdo-server.xml file. It is always a default cache (see above) but the capacity of the two fixed size LRU lists can be configured separately:
 
 
<source lang="xml">
 
<?xml version="1.0" encoding="UTF-8"?>
 
<cdoServer>
 
 
  <repository name="MyRepo">
 
    <property name="currentLRUCapacity" value="100000"/>
 
    <property name="revisedLRUCapacity" value="100"/>
 
 
    ...
 
 
  </repository>
 
 
</cdoServer>
 
</source>
 
  
  
 
----
 
----
 
Wikis: [[CDO]] | [[Net4j]] | [[EMF]] | [[Eclipse]]
 
Wikis: [[CDO]] | [[Net4j]] | [[EMF]] | [[Eclipse]]

Revision as of 05:31, 19 December 2012

The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).

Speeding up CDO is our constant goal and task. If you have any questions or suggestions, do not hesitate to contact any member of the CDO team.


Setting EMF Parameters

The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses models may make it seem slow. Therefore, here are a few things to consider while defining a model:

  • For one-to-many relationships, the Unique property should be set to “false”. Otherwise, add and set operations will fetch all objects in the list.
  • If it is absolutely necessary to define the Unique property to be “true”, containment or a bidirectional relation many-to-one should at least be set. That way, EMF will be able (starting from version 2.5) to accelerate insertion by looking up its inverse reference (eContainer or opposite reference) instead of crawling the list.

FeatureProperties.jpg

  • The Resolve Proxies property should be set to “false” as well in one-to-many relationships. Otherwise, in some cases, performance could happen to decrease. The internal structure of CDO never creates EMF proxies even when it references external data in a non-CDO resource. CDO will load them when the list is being accessed.
  • In any case, both properties (Unique and Resolve Proxies) should rarely be used at the same time, especially without an opposite single reference

By doing these simple things, CDO users can get a twentyfold performance improvement in their application. It is worth being tried: adding 10,000 elements in a list, with and without those changes, to see the difference.


Batch processing existing meta models (using xquery)

If you have a xquery processor installed (basex (BSD Licence) is a good choice), you can try the following xquery script to batch process your existing ecore files to apply the above performance hints (only resolveProxies and unique and references with upper bound set to -1 will be checked).

let $dir := "path/to/your/ecore/files/" (: e.g.: "c:/workspace/com.example.model/model/", don't forget the ending '/':)
for $file in file:list($dir, false(), "*.ecore")
let $fullPath := concat($dir, $file)
return
  copy $c := doc($fullPath)
  modify (
    (: this part will set 'Resolve Proxies' to false :) 
    for $r in $c//eStructuralFeatures
    where $r[@upperBound = -1 and @*:type = "ecore:EReference"]
    return (
      delete node $r/@resolveProxies,
      insert node attribute {'resolveProxies'}{'false'} into $r
    ),
    (: this part will set 'Unique' to false :)
    for $r in $c//eStructuralFeatures
    where $r[@upperBound = -1 and @*:type = "ecore:EReference" and not(@containment = 'true') and empty(@eOpposite)]
    return (
      delete node $r/@unique,
      insert node attribute {'unique'}{'false'} into $r
    ),
    fn:put($c, $fullPath)
  )
  return $fullPath


Attention: The script will overwrite your ecore files, i.e. any indentations and/or formatting will be lost (if you care about that).

Caching in CDO

There are three important places in CDO where caches are used:

  • CDOView maintains a cache of CDOObjects (client side). This cache is always a memory sensitive cache which is not configurable.
  • CDOSession (through CDORevisionManager) maintains a cache of CDORevisions (client side). This cache implements CDORevisionCache.
  • IRepository (through CDORevisionManager) maintains a cache of CDORevisions (server side). This cache implements CDORevisionCache.

Loading Partial Collections – CDOCollectionLoadingPolicy

The CDOCollectionLoadingPolicy feature of the CDOSession controls how a list gets populated.

By default, when an object is fetched, all its fields are filled with the proper values. See Figure 1.

Tweaking CDO Performance Figure1.jpg

This could be time-consuming, especially if the ref1 reference does not need to be accessed.

In CDO it is possible to fetch collections partially. The CDOCollectionLoadingPolicy feature defines how a list will be loaded.

The implementation that is shipped with CDO makes a distinction between the two following situations:

  • How many CDOIDs to fill when an object is loaded for the first time;
  • Which elements to fill with CDOIDs when the accessed element is not yet filled.
        CDOUtil.createCollectionLoadingPolicy (initialChunkSize, numberOfIndexToResolve);

Example: Let's suppose that the implementation is defined as follows:

CollectionLoadingPolicy policy = CDOUtil.createCollectionLoadingPolicy(10, 20);
session.options().setCollectionLoadingPolicy(policy);

When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:

Tweaking CDO Performance Figure2.jpg

As soon as any element beyond the tenth element gets accessed in the list, CDO asks the CDOCollectionLoadingPolicy feature to fill more elements. The example policy would load twenty more CDOIDs into the list.

Also, if the list is accessed by index, it does not need to fetch items from the beginning of the index, only that defined by the CDOCollectionLoadingPolicy feature.

Based on some tests, good performance can be achieved by using the following settings:

 
  session.options().setCollectionLoadingPolicy (CDOUtil.createCollectionLoadingPolicy(0, 300));

The code line above means that no CDOIDs should be fetched into the reference lists until the lists are actually accessed.

The end-user could provide its own implementation of the CDOCollectionLoadingPolicy interface.

Prefetching Target Objects Automatically – CDORevisionPrefetchingPolicy

The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.

The difference between the CDOCollectionLoadingPolicy feature and the CDORevisionPrefetchingPolicy feature is subtle. The CDOCollectionLoadingPolicy feature determines how and when to fetch CDOIDs, while the CDORevisionPrefetchingPolicy feature determines how and when to resolve CDOIDs (i.e. fetch the target objects).

What happens when list items are being accessed? The list fetches objects one at a time.

As an example, here is what happens while iterating through the ref1 list:


Tweaking CDO Performance Figure3.jpg


  1. iterator.next();
  2. oid3 is not in the cache, load oid3
  3. iterator.next();
  4. oid4 is not in the cache, load oid4
  5. iterator.next();
  6. oid5 is not in the cache, load oid5


Steps 2, 4 and 6 are the slowest operations. Since oid3 is not in the cache, it will be fetched from the server. Every object will be fetched sequentially.
Why not be smarter? Why not load more objects at a time? This would reduce the number of client-server round trips. When oid3 is being loaded, oid4 and oid5 could be loaded at the same time.

  1. iterator.next();
  2. oid3 is not in the cache, load oid3, oid4, oid5
  3. iterator.next();
  4. oid4 is in the cache
  5. iterator.next();
  6. oid5 is in the cache


Instead of three, only one call will be made to the server. How many calls would be safe for a list containing 100 or 10,000 items?
This feature uses CDOView.setRevisionPrefetchingPolicy. For example:

view.options().setRevisionPrefetchingPolicy (CDOUtil.createRevisionPrefetchingPolicy(10));


The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.

Prefetching Nested Objects Explicitely – cdoPrefetch()

As of CDO 3.0 the CDOObject interface supports prefetching of (the revisions for) nested objects, e.g.:

  object.cdoPrefetch(CDORevision.DEPTH.INFINITE);

Defining Fetch Rules Dynamically – CDOFetchAnalyzer

In many applications, hard coded rules are used to determine what to fetch. This is mainly to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...

The CDOFetchAnalyzer feature can be used to define rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.

Examples will be available soon. (Contributions welcome!)



Wikis: CDO | Net4j | EMF | Eclipse

Copyright © Eclipse Foundation, Inc. All Rights Reserved.