Jump to: navigation, search

Difference between revisions of "EclipseLink/DesignDocs/371950"

Line 2: Line 2:
  
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
 
+
== Purpose ==
{{warning|Work in progress|This analysis is in progress.}}
+
 
+
== Purpose ==
+
  
 
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
 
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
  
== Preliminary  ==
+
== Requirements ==
 
+
The persistence unit in eclipselink-annotation-model.jar from JPA testing was chosen for investigation as it is the largest catch all unit in testing. Gathering acurate numbers to determine the costs and benifits are difficult as serialization is not posssible, and it is not yet understood what can be shared. The org.eclipse.persistence.sessions.Project object is going to be used as a starting point as it is the underlying object that gets built from metadata processing and should contain all mapping information for a session/EntityManagerFactory. Caching this object should be sufficient to prevent the need to reprocess the entire persistence unit as would be done from scratch.
+
 
+
The project cannot be serialized as is, and the process of serializing to a file would depend entirely on file io. Initial numbers gathered indicate that creating a session from an existing project into the SessionManager, and then building an EntityManagerFactory/EntityManager from it takes 1/10 the time as building the initial persistence unit. This number is incorrect though, as the test had to build the project by accessing the default persistence unit, thereby causing the agent to load and much of the static initialization to be done. Comparing the time to load a default persistence unit to a subsequent unit within the same persitence.xml, the subsequent pu took 1/3 the time. So 2/3 could have been due to costs that might not be able to be avoided through metadata caching - further testing is required.
+
 
+
The next step is to modify the org.eclipse.persistence.sessions.Project and its references so that it can be reliably seralized and reused when serialized.
+
 
+
=== Problems and resolutions  ===
+
 
+
1) a few classes are not serializable.
+
 
+
      r) add serializable interface to them when encountered.
+
 
+
2)  Queries are stored in 2 places in the session.  JPAQueries (named JPQL) are put into the session and processed during login.  Other native queries (SQL, stored proc etc) are immediately processed and put into the session.  Project itself already has a collection of queries which are added to the session's query collection when it is passed to the session's constructor.
+
 
+
      r) Removing the JPAQuery construct and directly adding named JPQL queries in the same way named SQL queries are processed.  All will then be stored on the project, and then later added to the session during deploy
+
 
+
==== with deserialization and initialization  ====
+
  
1) Project assumes it has a collection of queries when creating a session, but this is marked transient (an ongoing theme)
+
# EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
 +
# Project caching must allow weaving to happen
 +
# It must be configurable to allow alternate implementations to cache the project differently
 +
# The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.
  
      r) remove the transient marker. (consequences?) Attributes holding user objects will remain transient
+
== Design ==
  
2) deploy calls convertClassNamesToClasses on the project
 
  
        2a) results in NPE since most queries are transient (queries held in DescriptorQueryManager are almost entirely transient)
+
=== PersistenceUnitProperties ===
  
             2a r) Not call convertClassNamesToClasses on serialized project since the classloader is likely going to be the application loader anyway. 
+
public static final String PROJECT_CACHE_ACCESSOR = "eclipselink.project-cache-accessor";
 +
will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface.  By default a project cache accessor will not be used. 
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;r) (?)The loader will need to be looked at to make sure we use the correct one avoiding the need for this method call.  
+
The "eclipselink.project-cache-accessor.<implementationShortName>" subset of properties will be used for implementation specific properties.
 +
=== Interface ===
  
As we do not serialize queries other than existance checks, I assume this is because they are not needed on projects used by remote sessions. Changing this will impact usage/performance of remote sessions&nbsp;which needs to be investigated more.
+
public interface ProjectCacheAccessor {
 +
  public Project retrieveProject(Properties properties, Classloader loader);
 +
  public void storeProject(Project project, Properties properties);
 +
}
  
3) Customizers are called as they are processed and not stored on the project/session (see processCustomizers on MetadataProcessor). This will require refactoring to allow a string representation to be added to the project, and call them after the project is serialized so that they are not called twice on the same project.
+
=== Included Implementation ===
=== Open Items  ===
+
  
*how this should interact with extensibility and RCM refresh commands. A user might not wish to get the cached project when triggering that the metadata has refreshed, so it will need a way to be overriden, but once read in, others on the server might want to use the cached version. The timing of caching the project might be a factor with the current setup
+
This feature will include a ProjectCacheAccessor implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE_ACCESSOR property with a value of "java-serialization"
*if the project isn't built and cached before the RCM refresh command goes out.
+
This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache-accessor.java-serialization.file" property being defined.
*Dynamic classes are currently built using the MetadataDescriptors, not the Project/Descriptor classes that will be cached. This will require changes to how dynamic entities are created to be supported and is left outside the scope of this feature.
+
*Serialization could be a problem if some nodes using it are using an EclipseLink version different than the project was initially serialized from.  Ie if one node in a cluster is patched while the others are in the process.
+
* ClassExtractor/MethodExtractor have instances that might not be serializable.  Need to verify that this is working correctly once done.
+
** Same potential problem with converter.addConversionValue().  The converter stores object instances which might not be serializable.
+
* structureConverters are added post login to the databasePlatform object.  Need a way to store them beforehand in the project or they will be lost.
+
* Queries are stored in 2 places in the session.  JPAQueries (named JPQL) are put into the session and processed during login.  Other native queries (SQL, stored proc etc) are immediately processed and put into the session.  Project itself already has a collection of queries which are added to the session's query collection when it is passed to the session's constructor.
+
* This currently isn't working with weaving as the project contains references to classfiles that cause the files to get loaded while attempting to weave. The code is the same (EclipseLink goes through predeploy twice when weaving is involved), so the project cache will either need to be ignored when weaving or the project modified to not hold class instances and use strings instead.
+

Revision as of 15:59, 23 May 2012

EclipseLink Metadata Cache

Purpose

This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.

Requirements

  1. EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
  2. Project caching must allow weaving to happen
  3. It must be configurable to allow alternate implementations to cache the project differently
  4. The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.

Design

PersistenceUnitProperties

public static final String PROJECT_CACHE_ACCESSOR = "eclipselink.project-cache-accessor"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface. By default a project cache accessor will not be used.

The "eclipselink.project-cache-accessor.<implementationShortName>" subset of properties will be used for implementation specific properties.

Interface

public interface ProjectCacheAccessor {

 public Project retrieveProject(Properties properties, Classloader loader);
 public void storeProject(Project project, Properties properties);

}

Included Implementation

This feature will include a ProjectCacheAccessor implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE_ACCESSOR property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache-accessor.java-serialization.file" property being defined.