EclipseLink/DesignDocs/221546(1.1)

From Eclipsepedia

Jump to: navigation, search

Design Specification: Performance and Concurrency

ER 221546

Feedback

Document History

Date Author Version Description & Notes
2008-07-17 James 0.1 Draft

Project overview

This project groups several smaller performance related bug fixes and enhancements into a single unit. Its' goal is the improve the performance, concurrency and scalability of the product.

Concepts

Performance is concerned about reducing CPU usage and finding more optimal methods of processing operations.

Concurrency is concerned with reducing contention and improving multi-threaded and multi-CPU performance.

Scalability is concerned with clustering, large workloads and data.

Requirements

The goal of this project is to ensure that our product remains the leading high-performance persistence solution. Areas of improvement are determined through performance comparison with other persistence products and benchmarking.

Specific performance investigations desired for this release:

  • JPA performance comparison with EclipseLink 1.0
  • core performance comparison with EclipseLink 1.0
  • core concurrency comparison with EclipseLink 1.0
  • JPA cache coordination comparison in clustered environment
  • JPA performance comparison with Hibernate
  • JPA performance comparison with OpenJPA
  • JPA concurrency comparison with Hibernate
  • JPA provider and app server comparison through SPECjAppServer ® benchmark.

Design Constraints

The goal of the project is to improve performance of common usage patterns. Fringe features and usage patterns will not be specifically targeted unless found to be highly deficient.

Any optimization must also be weighed in its' impact on usability, and spec compliance. Optimizations that may have a large negative impact to usability may need to be only enabled through specific configuration.

Functionality

Each specific performance improvement is discussed separately below.

Sequencing

In EclipseLink 1.0 the Sequence object is stored in the Platform, and every sequence operation must lookup the sequence to perform the operation. For a single insert the sequence performs the following, each requiring a lookup:

  • persist: check sequence type
  • persist: check if id is assigned
  • persist: assigns id
  • commit: check sequence type
  • commit: check if id is assigned
  • insert: check sequence type
  • insert: check if id is assigned

Sequencing will be optimized to store/cache the Sequence object on the descriptor. This avoids the cost of the lookup. The sequence mapping will also be stored on the ObjectBuilder to optimize the sequence operations. The id check will be removed from sequencing, and instead just perform a null/<=0 check, according to the descriptor's IdValidation policy. This will avoid the majority of sequence lookups. The sequence check in the pre-insert will also be removed if using UnitOfWork as the id is assigned in commit.

The Session use to have some level of support for a Sequence to be disconnected, allowing changing sequencing options on the fly. Since the Sequence is now stored in the descriptor which is potentially shared meta-data, the disconnect will be removed. It can still be called directly, but changing sequencing on the fly is not something that really works in general as switching from table to native sequencing requires changes in the descriptor's cached insert SQL.

Avoiding ChangeSets for New Objects

In EclipseLink 1.0 change sets are created for both new and existing objects (that changed). The object is used to insert to the database, but for updates the change set is used. To merge into the cache (if caching) the change set is used for both updates and inserts. If the merge needs to merge a reference to an existing object or update to an existing object, and the original object is not in the cache (transactional read, gc), the merge uses the object instead of the change set. Change sets are also serialized for cache coordination if enabled, however new objects are not sent by cache coordination by default.

This optimization will avoid creating ChangeRecords for the ChangeSets for new objects. The change sets will still be created, as that have many dependecies in the commit and merge, and are used to cache certain artifacts such as the CacheKey to optimize the merge and commit. There is one ChangeSet for each new object, and used to be one ChangeRecord for each attribute, the ChangeRecords will no longer be populated. This improves performance, as these ChangeRecords are not normally required. The commit already uses objects, so will not change, the merge will be changed to use objects, but this is something that was already supported.

If the descriptor uses cache coordination with new objects, then the ChangeRecords will still be created, and the old merge will be used. This is also somewhat of a backdoor to get the old merge functionality. There is also a backdoor static on ClassDescriptor.shouldUseFullChangeSetsForNewObjects, to allow the old functionality in case of unforeseen issues.

If a new object's ChangeSet is referenced from cache coordination from an existing object's ChangeSet, then ChangeRecords will be filled-in and written during serialization.

Deferring Resume

In EclipseLink 1.0 after a commit() or flush() operation all managed objects were had their change tracking reset. This could mean re-building of the backup clones, or the clearing of their change listeners. Also some bookkeeping on in the UnitOfWork is required for a resume.

Commonly in JPA, the EntityManager is closed after a commit. For a managed EntityManager this is always the case (unless extended). So, the resume cost commonly has no purpose. JPA states that changes made before a call to Transaction.begin() are undetermined, so another option is to defer the resume, or possibly even avoid change tracking until the begin().

A persistence unit option will be added to close the EntityManager on commit instead of resuming it. By default this will not be enabled.

Testing

Both the existing performance and concurrency tests and pubic benchmarks will be used to monitor and evaluate performance improvements.

Specific performance testing desired for this release:

  • JPA performance comparison with EclipseLink 1.0
  • core performance comparison with EclipseLink 1.0
  • core concurrency comparison with EclipseLink 1.0
  • JPA cache coordination comparison in clustered environment
  • JPA performance comparison with Hibernate
  • JPA performance comparison with OpenJPA
  • JPA concurrency comparison with Hibernate
  • JPA provider and app server comparison through SPECjAppServer ® benchmark.

API

  • PersistenceUnitProperties.PERSISTENCE_CONTEXT_CLOSE_ON_COMMIT
  • EntityManagerProperties.PERSISTENCE_CONTEXT_CLOSE_ON_COMMIT

Config files

persistence.xml

  • property: "eclipselink.persistence-context.close-on-commit" = "true" | "false"

GUI

Documentation

Document new persistence-context close-on-commit option under persistence properties and performance sections.

Document that change sets for new objects will no longer have change records in release notes.

Open Issues

Issue # Owner Description / Notes
1 Group Should weaving.eager be true or false by default?

Decisions

Issue # Description / Notes Decision
1 What should be default for persistence-context close-on-commit? false (existing)

Future Considerations

Continually improve performance.