Jump to: navigation, search

EclipseLink/DesignDocs/221546

< EclipseLink‎ | DesignDocs
Revision as of 10:29, 12 May 2008 by James.sutherland.oracle.com (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Design Specification: Performance and Concurrency

ER 221546

Feedback

Document History

Date Author Version Description & Notes
2008-03-05 James 0.1 Draft

Project overview

This project groups several smaller performance related bug fixes and enhancements into a single unit. Its' goal is the improve the performance, concurrency and scalability of the product.

Concepts

Performance is concerned about reducing CPU usage and finding more optimal methods of processing operations.

Concurrency is concerned with reducing contention and improving multi-threaded and multi-CPU performance.

Scalability is concerned with clustering, large workloads and data.

Requirements

The goal of this project is to ensure that our product remains the leading high-performance persistence solution. Areas of improvement are determined through performance comparison with other persistence products and benchmarking.

Specific performance investigations desired for this release:

  • JPA performance comparison with TopLink 11g
  • core performance comparison with TopLink 11g
  • core concurrency comparison with TopLink 11g
  • JPA cache coordination comparison in clustered environment
  • JPA performance comparison with Hibernate
  • JPA performance comparison with OpenJPA
  • JPA concurrency comparison with Hibernate
  • JPA provider and app server comparison through SPECjAppServer ® benchmark.

Design Constraints

The goal of the project is to improve performance of common usage patterns. Fringe features and usage patterns will not be specifically targeted unless found to be highly deficient.

Any optimization must also be weighed in its' impact on usability, and spec compliance. Optimizations that may have a large negative impact to usability may need to be only enabled through specific configuration.

Functionality

Each specific performance improvement is discussed seperately below.

EntityManager.getReference() Proxies

Requirement

The JPA spec provides a find() and a getReference(), find should query the database for the instance, getReference should return a "proxy" stand-in for the object. Currently getReference() does a find(), it should instead return a proxy. This will significantly improve the performance of the operation. The main usage of this API is to allow an object to be inserted/updated with references to other objects, without requiring to read those objects.

Design

Fetch groups will be used to build an unfetched proxy. A new instance will be create, its' primary key set and given a fetch-group composed of its' primary key attributes. It will be registered with the UnitOfWork.

Testing

  • getReference() without weaving
  • getReference() outside a transaction
  • getReference() and access to a get method
  • getReference() and update to a set method
  • getReference() used in an update of another object
  • getReference() used in an insert of another object

Indirection with EAGER

Requirement

Currently if EAGER relationships are used there are two main performance side-effects:

  • Change tracking is disabled.
  • Deferred cache locks are used.

It should be possible to use EAGER relationships and not suffer these performance issues.

Design

Provide an option to still enable indirection in the mappings, but instantiate the indirection eagerly. An isLazy option will be added to ForeignReferenceMapping, if eager weaving is enabled the mapping will always be configured to use indirection and indirection will be weaved even if eager. After building an object (and releasing the cache lock), the ObjectBuilder will instantiate all eager relationships.

Testing

  • weaving test to ensure object with eager relationship is still weaved for indirection and change tracking
  • if not the default, run entity manager tests with eager weaving option set.

Existence Validation

Requirement

Currently the does-exist query is executed on persist(), this adds overhead and can cause database access if the cache is invalid. Existence is also executed for 0 id values, but 0 should be interpreted the same as null.

Design

Since persist() can only be called for new objects, or managed objects, and cannot be called for detached existing objects, their is little reason to execute an existence check. The check is only for validation purposes, to throw an error earlier, instead of allowing the insert to cause a constraint error. Since we only checked the cache, (unless the cache was invalid or early transaction), this would not even always catch the user error. Instead if persist is called on a non-managed object it will be assumed to be new. A database constraint error will occur if it is existing (assuming constraints). A persistence unit property will be added "eclipselink.validate-existence", to allow the user to enforce validation. Note that this will execute the normal does-exist query that the user configured, not overridding the user's configuration to check-cache as was done previously, so may cause database access if the object is not in the cache, and has a non-null primary key.

We check for a primary key contains null, but do not check for 0. Any primitive id will be 0, not null, so this can result in execution of does-exist query and database access for new objects. The null primary key check will be expanded to also check for 0. This will be optimized to check only for long and int as BigDecimal and other Numbers can be null, and conversion should be avoided. This also provides a workaround if someone really wanted a 0 id. The current 0 sequencing will be removed in place of this check.

Testing

  • Most test models will use the default existence, and some will use the property and ensure the validation exception occurs. Since the advanced model already validate this check, adding the property to it will verify the property works.
  • New test to check that a 0 primary key value does not result in database access, and has sequence value assigned.

Testing

Both the existing performance and concurrency tests and pubic benchmarks will be used to monitor and evaluate performance improvements.

Specific performance testing desired for this release:

  • JPA performance comparison with TopLink 11g
  • core performance comparison with TopLink 11g
  • core concurrency comparison with TopLink 11g
  • JPA cache coordination comparison in clustered environment
  • JPA performance comparison with Hibernate
  • JPA performance comparison with OpenJPA
  • JPA concurrency comparison with Hibernate
  • JPA provider and app server comparison through SPECjAppServer ® benchmark.

API

  • EntityManager.getReference() - Will now return a proxy.
  • UnitOfWork.getReference() - Returns a object or proxy by primary key.
  • EntityManager.persist() - Will no longer throw an error for detached existing objects (that are in the cache).

Config files

persistence.xml

  • eclipselink.weaving.eager (true/false, default?)
  • eclipselink.validate-existence (true/false, default=false)

GUI

Documentation

  • Document that EntityManager.getReference() now returns a proxy (unfetched) instance, and subsequent access may trigger an ObjectNotFoundException if the object does not exist. Document that fetch-group weaving must enabled (the default when weaving) to allow getReference() to return a proxy, otherwise it does a find().
  • Document weaving eager setting and update weaving/change tracking defaults.
  • Document validate existence property.
  • Document 0 is no longer a valid primary key value (for a primitive field).

Open Issues

Issue # Owner Description / Notes
1 Group Should weaving.eager be true or false by default?

Decisions

Issue # Description / Notes Decision
1 getReference() Use fetch-groups to create a proxy.
2 getReference() Add as UnitOfWork API.
3 persist() Do not execute does-exist.

Future Considerations

Continually improve performance.