Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

EclipseLink/DesignDocs/298985

< EclipseLink‎ | DesignDocs
Revision as of 15:08, 4 February 2010 by James.sutherland.oracle.com (Talk | contribs) (Singleton cache keys and cache refactoring)

Design Specification: Performance and Concurrency

ER 298985

Feedback

Document History

Date Author Version Description & Notes
2010-01-06 James 0.1 Draft
2010-01-206 James 0.2 Updated CacheId, batch reading

Project overview

This project groups several smaller performance related bug fixes and enhancements into a single unit. Its' goal is the improve the performance, concurrency and scalability of the product.

Concepts

Performance is concerned about reducing CPU usage and finding more optimal methods of processing operations.

Concurrency is concerned with reducing contention and improving multi-threaded and multi-CPU performance.

Scalability is concerned with clustering, large workloads and data.

Requirements

The goal of this project is to ensure that our product remains the leading high-performance persistence solution. Areas of improvement are determined through performance comparison with other persistence products and benchmarking.

Specific performance investigations desired for this release:

  • JPA performance comparison with EclipseLink 2.0
  • core performance comparison with EclipseLink 2.0
  • JPA concurrency comparison with EclipseLink 2.0
  • JPA provider and app server comparison through SPECjAppServer ® benchmark.

Design Constraints

The goal of the project is to improve performance of common usage patterns. Fringe features and usage patterns will not be specifically targeted unless found to be highly deficient.

Any optimization must also be weighed in its' impact on usability, and spec compliance. Optimizations that may have a large negative impact to usability may need to be only enabled through specific configuration.

Functionality

Each specific performance improvement is discussed separately below.

Building objects from ResultSets

There is currently an old prototype of building objects directly from ResultSets. The goal of the feature is to allow "simple" objects and queries to be able to bypass the intermediate DatabaseRow objects build from JDBC used to build objects. Also to avoid a lot of the checks for non core features and simplify the object building process.

This will introduce a second path on queries and object building for these optimized queries, that should avoid a lot of the general overhead required to support advanced features. The feature will only be used on a query, or perhaps a class through configuration or if the class/query are determined to be "simple".

Initially simple will only include direct mappings, but hopefully be expanded to include single primary key relationships, or perhaps composite primary keys. It will not include inheritance, events, complex queries, fetch groups, etc.

Singleton cache keys and cache refactoring

Currently cache access can be an expensive operation. This will be improved by simplifying the CacheKey. For singleton primary key objects, the Id value (Integer, Long, String) will instead be used as the cache key. This will have a very broad impact as it changes the usage of Vector for the primary key, to be of type Object. A new CacheId object will be used for composite or complex primary keys. The CacheId will be a basic wrapper for an Object array, adding equals and hashCode implementations. A CacheKey will still be used as the cache value.

The cacheKeyType will be configurable on a ClassDescriptor or through the existing @PrimaryKey annotation.

This would affect a lot of internal API, as well as some external API that currently is typed to Vector. The public API taking Vector could still be supported, but the API returning Vector would either need to be changed, or new methods added and old ones deprecated.

The purpose of this change is for performance reasons. It also has the benefit of removing our usage of the legacy Vector API. For JPA classes that use a single simple Id value, it also has the benefit of using the JPA Id value as the cache key. For JPA IdClass or EmbeddedId it will not match the JPA Id, but the cache key is mainly an internal value, and should reflect what is optimal for cache usage. This work removes the primary key casting as Vector, so would make it easy to support usage of the JPA IdClass if desired (as a separate feature unrelated to performance). Extreme caution should be used in doing this however, as it requires that the user implement equals() and hashCode() correctly in their IdClass, which is quite easy to mis implementing, or implement incorrectly. It also will cause a negative performance impact as building the IdClass in our internal cache usage will be much less efficient than usage of the CacheId, and the user's equals() and hashCode() implementation is most likely not optimal.

The existing API on IdentiyMapAccessor, ReadObjectQuery and ReportQuery currently uses Vector for the primary key. This API will still be supported, but deprecated. New API will be added that take Object for the primary key.

The JPA Cache interface will be extended in the same pattern as our JpaEntityManager to expose our additional cache API using the JPA Id. This will make our internal cache key type transparent to JPA users.

Batching reading using exist and IN

Currently batch reading uses a join of the batch query to the source query.

This join has some issues:

  • If a 1-m or m-m, join causes duplicate rows to be selected, which a DISTINCT is needed to filter.
  • DISTINCT does not work with LOBs.
  • DISTINCT may be less efficient on some databases than alternatives.
  • Does not work well with cursors.
  • Needs verification if works with pagination.

One alternative is to use an EXIST with a sub-select instead of a JOIN. This should not result in duplicate rows, so avoid issues with DISTINCT.

Another option is to load the target objects using a IN clause containing the source query object's primary keys. This would also work with cursors, but as the limitation of requiring custom SQL support for composite primary keys, and produces a large dynamic SQL query.

A new BatchFetchType enum will be define and the usesBatchReading flag will enhance to setBatchFetch allowing for JOIN, EXISTS or IN. This option will also be added to ObjectLevelReadQuery, rolling up the current 4 batch reading properties into a new BatchFetchPolicy, also moving them up from ReadAllQuery to allow ReadObjectQuery to also specify nested batched attributes. A new BatchFetch annotation and query hint will be added.

Testing

Both the existing performance and concurrency tests and pubic benchmarks will be used to monitor and evaluate performance improvements.

Specific performance testing desired for this release:

  • JPA performance comparison with EclipseLink 2.0
  • core performance comparison with EclipseLink 2.0
  • JPA concurrency comparison with EclipseLink 2.0
  • JPA provider and app server comparison through SPECjAppServer ® benchmark.

API

Singleton cache keys and cache refactoring

(old API is still supported, but deprecated)

  • IdentityMapAccessor
    • *(Vector, Class) -> *(Object, Class)
  • Session
    • keyFromObject(Object) -> getId(Object)
  • ReadObjectQuery
    • get/setSelectionKey(List) -> get/setSelectionId(Object)
  • ReportQueryResult
    • getPrimaryKeyValues() -> getId()
  • @PrimaryKey
    • cacheKeyType CacheKeyType (enum, ID_VALUE, CACHE_ID)
  • JpaCache
    • clear()
    • clear(Class)
    • clearQueryCache()
    • clearQueryCache(String)
    • timeToLive(Object)
    • isValid(Object)
    • isValid(Class, Object)
    • print()
    • print(Class)
    • printLocks()
    • validate()
    • getObject(Class, Object)
    • putObject(Object)
    • removeObject(Object)
    • removeObject(Class, Object)
    • contains(Object)
    • evict(Object)
    • getId(Object)

Batching reading using exist and IN

  • BatchFetchType (JOIN, EXISTS, IN)
  • @BatchFetch
  • ForeignReferenceMapping.setBatchFetch(BatchFetchType)
  • ObjectLevelReadQuery.setBatchFetchType(BatchFetchType)

Config files

  • orm.xml
    • support for cache-key-type on <primary-key>

Documentation

Open Issues

Issue # Owner Description / Notes
1 Group What is the impact of the cache refactoring on Cache interceptors integration?
2 Group What is the impact of the cache refactoring on backward compatibility?

Decisions

Issue # Description / Notes Decision

Future Considerations

Continually improve performance.

Back to the top