Difference between revisions of "EclipseLink/UserGuide/JPA/sandbox/caching/Cache Concepts"
|Line 1:||Line 1:|
Latest revision as of 15:50, 22 February 2011
|Mailing List • Forums • IRC • mattermost|
|Open • Help Wanted • Bug Day|
- 1 Cache Concepts
- 1.1 Cache Type and Object Identity
- 1.1.1 Full Identity Map
- 1.1.2 Weak Identity Map
- 1.1.3 Soft Identity Map
- 1.1.4 Soft Cache Weak Identity Map and Hard Cache Weak Identity Map
- 1.1.5 No Identity Map
- 1.1.6 Guidelines for Configuring the Cache and Identity Maps
- 1.1.7 What You May Need to Know About the Internals of Weak, Soft, and Hard Identity Maps
- 1.2 Querying and the Cache
- 1.3 Handling Stale Data
- 1.4 Explicit Query Refreshes
- 1.5 Cache Invalidation
- 1.6 Cache Coordination
- 1.7 Cache Isolation
- 1.8 Cache Locking and Transaction Isolation
- 1.9 Cache Optimization
- 1.1 Cache Type and Object Identity
The following sections describe concepts unique to the EclipseLink cache:
- Cache Type and Object Identity
- Querying and the Cache
- Handling Stale Data
- Explicit Query Refreshes
- Cache Invalidation
- Cache Coordination
- Cache Isolation
- Cache Locking and Transaction Isolation
- Cache Optimization
Cache Type and Object Identity
EclipseLink preserves object identity through its cache using the primary key attributes of a persistent entity. These attributes may or may not be assigned through sequencing (see Projects and Sequencing). In a Java application, object identity is preserved if each object in memory is represented by one, and only one, object instance. Multiple retrievals of the same object return references to the same object instance–not multiple copies of the same object.
Maintaining object identity is extremely important when the application's object model contains circular references between objects. You must ensure that the two objects are referencing each other directly, rather than copies of each other. Object identity is important when multiple parts of the application may be modifying the same object simultaneously.
We recommend that you always maintain object identity. Disable object identity only if absolutely necessary, for example, for read-only objects (see Configuring Read-Only Descriptors).
You can configure how object identity is managed on a class-by-class basis. The ClassDescriptor object provides the cache and identity map options described in this table.
Cache and Identity Map Options
|Option (Identity Map)||Caching||Guaranteed Identity||Memory Use|
For more information, see Guidelines for Configuring the Cache and Identity Maps.
Full Identity Map
This option provides full caching and guaranteed identity: objects are never flushed from memory unless they are deleted.
It caches all objects and does not remove them. Cache size doubles whenever the maximum size is reached. This method may be memory-intensive when many objects are read. Do not use this option on batch operations.
We recommend using this identity map when the data set size is small and memory is in large supply.
Weak Identity Map
This option is similar to the full identity map, except that the map holds the objects by using weak references. This method allows full garbage collection and provides full caching and guaranteed identity.
The weak identity map uses less memory than full identity map but also does not provide a durable caching strategy across client/server transactions. Objects are available for garbage collection when the application no longer references them on the server side (that is, from within the server JVM).
Soft Identity Map
This option is similar to the weak identity map, except that the map uses soft references instead of weak references. This method allows full garbage collection and provides full caching and guaranteed identity.
The soft identity map allows for optimal caching of the objects, while still allowing the JVM to garbage collect the objects if memory is low.
Soft Cache Weak Identity Map and Hard Cache Weak Identity Map
These options are similar to the weak identity map except that they maintain a most frequently used subcache. The subcache uses soft or hard references to ensure that these objects are garbage-collected only if the system is low on memory.
The soft cache weak identity map and hard cache weak identity map provide more efficient memory use. They release objects as they are garbage-collected, except for a fixed number of most recently used objects. Note that weakly cached objects might be flushed if the transaction spans multiple client/server invocations. The size of the subcache is proportional to the size of the identity map as specified by the ClassDescriptor method setIdentityMapSize. You should set this cache size to be as large as the maximum number of objects (of the same type) referenced within a transaction (see Configuring Cache Type and Size at the Descriptor Level).
We recommend using this identity map in most circumstances as a means to control memory used by the cache.
For more information, see What you may need to Know About the Internals of Weak, Soft, and Hard Identity Maps.
No Identity Map
This option does not preserve object identity and does not cache objects.
We do not recommend using the no identity map option. Instead, review the alternatives of cache invalidation and isolated caching.
Guidelines for Configuring the Cache and Identity Maps
You can configure the cache at the project (Configuring Cache Type and Size at the Project Level) or descriptor (Configuring Cache Type and Size at the Descriptor Level) level.
Use the following guidelines when configuring your cache and identity map:
- If objects with a long life span and object identity are important, use a SoftIdentityMap, SoftCacheWeakIdentityMap or HardCacheWeakIdentityMap policy. For more information on when to choose one or the other, see What you may need to Know About the Internals of Weak, Soft, and Hard Identity Maps.
- If object identity is important, but caching is not, use a WeakIdentityMap policy.
- If an object has a long life span or requires frequent access, or object identity is important, use a FullIdentityMap policy.
WARNING: Use the FullIdentityMap only if the class has a small number of finite instances. Otherwise, a memory leak will occur.
- If an object has a short life span or requires frequent access, and identity is not important, use a CacheIdentityMap policy.
- If objects are discarded immediately after being read from the database, such as in a batch operation, use a NoIdentityMap policy. The NoIdentityMap does not preserve object identity.
Note: We do not recommend the use of CacheIdentityMap and NoIdentityMap policies.
What You May Need to Know About the Internals of Weak, Soft, and Hard Identity Maps
The WeakIdentiyMap and SoftIdentityMap use JVM weak and soft references to ensure that any object referenced by the application is held in the cache. Once the application releases its' reference to the object, the JVM is free to garbage collection the objects. When a weak and soft reference is garbage collected - is determined by the JVM. In general one could expect a weak reference to be garbage collected on each JVM garbage collector, and a soft reference to be garbage collected when the JVM determines memory is low.
The SoftCacheWeakIdentityMap and HardCacheWeakIdentityMap types of identity map contain the following two caches:
- Reference cache: implemented as a LinkedList that contains soft or hard references, respectively.
- Weak cache: implemented as a HashMap that contains weak references.
When you create a SoftCacheWeakIdentityMap or HardCacheWeakIdentityMap with a specified size, the reference cache LinkedList is exactly this size. The weak cache HashMap is initialized to 100 percent of the specified size: the weak cache will grow when more objects than the specified size are read in. Because EclipseLink does not control garbage collection, the JVM can reap the weakly held objects whenever it sees fit.
Because the reference cache is implemented as a LinkedList, new objects are added to the end of the list. Because of this, it is by nature a least recently used (LRU) cache: fixed size, object at the top of the list is deleted, provided the maximum size has been reached.
The SoftCacheWeakIdentityMap and HardCacheWeakIdentityMap are essentially the same type of identity map. The HardCacheWeakIdentityMap was constructed to work around an issue with some JVMs.
If your application reaches a low system memory condition frequently enough, or if your platform's JVM treats weak and soft references the same, the objects in the reference cache may be garbage-collected so often that you will not benefit from the performance improvement provided by it. If this is the case, we recommend that you use the HardCacheWeakIdentityMap. It is identical to the SoftCacheWeakIdentityMap except that it uses hard references in the reference cache. This guarantees that your application will benefit from the performance improvement provided by it.
When an object in a HardCacheWeakIdentityMap or SoftCacheWeakIdentityMap is pushed out of the reference cache, it gets put in the weak cache. Although it is still cached, EclipseLink cannot guarantee that it will be there for any length of time because the JVM can decide to garbage-collect weak references at anytime.
Querying and the Cache
A query that is run against the shared session cache is known as an in-memory query. Careful configuration of in-memory querying can improve performance (see How to Use In-Memory Queries).
By default, a query that looks for a single object based on primary key attempts to retrieve the required object from the cache first, searches the data source only if the object is not in the cache. All other query types search the database first, by default. You can specify whether a given query runs against the in-memory cache, the database, or both.
For more information, see Queries and the Cache.
Handling Stale Data
Stale data is an artifact of caching, in which an object in the cache is not the most recent version committed to the data source. To avoid stale data, implement an appropriate cache locking strategy.
By default, EclipseLink optimizes concurrency to minimize cache locking during read or write operations. Use the default EclipseLink isolation level, unless you have a very specific reason to change it. For more information on isolation levels in EclipseLink, see Cache Isolation.
Cache locking regulates when processes read or write an object. Depending on how you configure it, cache locking determines whether a process can read or write an object that is in use within another process.
A well-managed cache makes your application more efficient. There are very few cases in which you turn the cache off entirely, because the cache reduces database access, and is an important part of managing object identity.
To make the most of your cache strategy and to minimize your application's exposure to stale data, we recommend the following:
- Configuring a Locking Policy
- Configuring the Cache on a Per-Class Basis
- Forcing a Cache Refresh when Required on a Per-Query Basis
- Configuring Cache Invalidation
- Configuring Cache Coordination
Configuring a Locking Policy
Make sure you configure a locking policy so that you can prevent or at least identify when values have already changed on an object you are modifying. Typically, this is done using optimistic locking. EclipseLink offers several locking policies such as numeric version field, time-stamp version field, and some or all fields.
For more information, see Configuring Locking Policy.
Configuring the Cache on a Per-Class Basis
If other applications can modify the data used by a particular class, use a weaker style of cache for the class. For example, the SoftCacheWeakIdentityMap or WeakIdentityMap minimizes the length of time the cache maintains an object whose reference has been removed.
For more information, see Configuring Cache Type and Size at the Descriptor Level.
Forcing a Cache Refresh when Required on a Per-Query Basis
Any query can include a flag that forces EclipseLink to go to the data source for the most up-to-date version of selected objects and update the cache with this information.
For more information, see the following:
Configuring Cache Invalidation
Using descriptor API, you can designate an object as invalid: when any query attempts to read an invalid object, EclipseLink will go to the data source for the most up to date version of that object and update the cache with this information. You can manually designate an object as invalid or use a CacheInvalidationPolicy to control the conditions under which an object is designated invalid.
For more information, see Cache Invalidation.
Configuring Cache Coordination
If your application is primarily read-based and the changes are all being performed by the same Java application operating with multiple, distributed sessions, you may consider using the EclipseLink cache coordination feature. Although this will not prevent stale data, it should greatly minimize it.
For more information, see Cache Coordination.
Explicit Query Refreshes
Some distributed systems require only a small number of objects to be consistent across the servers in the system. Conversely, other systems require that several specific objects must always be guaranteed to be up-to-date, regardless of the cost. If you build such a system, you can explicitly refresh selected objects from the database at appropriate intervals, without incurring the full cost of distributed cache coordination.
To implement this type of strategy, do the following:
- Configure a set of queries that refresh the required objects.
- Establish an appropriate refresh policy.
- Invoke the queries as required to refresh the objects.
When you execute a query, if the required objects are in the cache, EclipseLink returns the cached objects without checking the database for a more recent version. This reduces the number of objects that EclipseLink must build from database results, and is optimal for noncoordinated cache environments. However, this may not always be the best strategy for a coordinated cache environment.
To override this behavior, set a refresh policy that specifies that the objects from the database always take precedence over objects in the cache. This updates the cached objects with the data from the database.
You can implement this type of refresh policy on each EclipseLink descriptor, or just on certain queries, depending upon the nature of the application.
For more information, see the following:
Note: Refreshing does not prevent phantom reads from occurring.
By default, objects remain in the session cache until they are explicitly deleted (see Deleting Objects) or garbage collected when using a weak identity map (see Configuring Cache Type and Size at the Project Level).
Alternatively, you can configure any object with a CacheInvalidationPolicy that lets you specify, either automatically or manually, under what circumstances a cached object is invalid: when any query attempts to read an invalid object, EclipseLink will go to the data source for the most up-to-date version of that object, and update the cache with this information.
You can use any of the following CacheInvalidationPolicy instances:
- DailyCacheInvalidationPolicy: the object is automatically flagged as invalid at a specified time of day.
- NoExpiryCacheInvalidationPolicy: the object can only be flagged as invalid by explicitly calling org.eclipse.persistence.sessions.IdentityMapAccessor method invalidateObject.
- TimeToLiveCacheInvalidationPolicy: the object is automatically flagged as invalid after a specified time period has elapsed since the object was read.
You can configure a cache invalidation policy in the following ways:
- At the project level that applies to all objects ( Configuring Cache Expiration at the Project Level)
- At the descriptor level to override the project level configuration on a per-object basis ( Configuring Cache Expiration at the Descriptor Level)
- At the query level that applies to the results returned by the query (How to Configure Cache Expiration at the Query Level)
If you configure a query to cache results in its own internal cache (see How to Cache Query Results in the Query Cache), the cache invalidation policy you configure at the query level applies to the query's internal cache in the same way it would apply to the session cache.
If you are using a coordinated cache (see Cache Coordination), you can customize how EclipseLink communicates the fact that an object has been declared invalid. For more information, see Configuring Cache Coordination Change Propagation at the Descriptor Level.
The need to maintain up-to-date data for all applications is a key design challenge for building a distributed application. The difficulty of this increases as the number of servers within an environment increases. EclipseLink provides a distributed cache coordination feature that ensures data in distributed applications remains current.
Cache coordination reduces the number of optimistic lock exceptions encountered in a distributed architecture, and decreases the number of failed or repeated transactions in an application. However, cache coordination in no way eliminates the need for an effective locking policy. To effectively ensure working with up-to-date data, cache coordination must be used with optimistic or pessimistic locking. We recommend that you use cache coordination with an optimistic locking policy (see Configuring Locking Policy).
You can use cache invalidation to improve cache coordination efficiency. For more information, see Cache Invalidation.
Isolated client sessions provide a mechanism for disabling the shared server session cache. Any classes marked as isolated only cache objects relative to the life cycle of their client session. These classes never utilize the shared server session cache. This is the best mechanism to prevent caching as it is configured on a per-class basis allowing caching for some classes, and denying for others.
For more information, see Isolated Client Sessions.
Cache Locking and Transaction Isolation
By default, EclipseLink optimizes concurrency to minimize cache locking during read or write operations. Use the default EclipseLink transaction isolation configuration unless you have a very specific reason to change it.
For more information, see Database Transaction Isolation Levels.
Tune the EclipseLink cache for each class to help eliminate the need for distributed cache coordination. Always tune these settings before implementing cache coordination.
For more information, see Optimizing Cache.