Jump to: navigation, search

Difference between revisions of "EclipseLink/DesignDocs/371950"

m
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
 
{{warning|Work in progress|This analysis is in progress.}}
 
  
 
== Purpose  ==
 
== Purpose  ==
Line 9: Line 7:
 
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
 
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
  
== Preliminary ==
+
== Requirements ==
 
+
The persistence unit in eclipselink-annotation-model.jar from JPA testing was chosen for investigation as it is the largest catch all unit in testing. Gathering acurate numbers to determine the costs and benifits are difficult as serialization is not posssible, and it is not yet understood what can be shared. The org.eclipse.persistence.sessions.Project object is going to be used as a starting point as it is the underlying object that gets built from metadata processing and should contain all mapping information for a session/EntityManagerFactory. Caching this object should be sufficient to prevent the need to reprocess the entire persistence unit as would be done from scratch.
+
  
The project cannot be serialized as is, and the process of serializing to a file would depend entirely on file io. Initial numbers gathered indicate that creating a session from an existing project into the SessionManager, and then building an EntityManagerFactory/EntityManager from it takes 1/10 the time as building the initial persistence unit. This number is incorrect though, as the test had to build the project by accessing the default persistence unit, thereby causing the agent to load and much of the static initialization to be done. Comparing the time to load a default persistence unit to a subsequent unit within the same persitence.xml, the subsequent pu took 1/3 the time. So 2/3 could have been due to costs that might not be able to be avoided through metadata caching - further testing is required.  
+
#EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
 +
#Project caching must allow weaving to occur
 +
#It must be configurable to allow alternate implementations to cache the project differently
 +
#The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.  
 +
#It not break any current functionality such as remote sessions.
  
The next step is to modify the org.eclipse.persistence.sessions.Project and its references so that it can be reliably seralized and reused when serialized.
+
== Design  ==
  
=== Problems and resolutions ===
+
=== PersistenceUnitProperties ===
  
1) a few classes are not serializable.  
+
public static final String PROJECT_CACHE = "eclipselink.project-cache"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface
  
      r) add serializable interface to them when encountered.  
+
The "eclipselink.project-cache.<implementationShortName>" subset of properties will be used for implementation specific properties.  
  
2) Queries are stored in 2 places in the session.  JPAQueries (named JPQL) are put into the session and processed during login.  Other native queries (SQL, stored proc etc) are immediately processed and put into the session.  Project itself already has a collection of queries which are added to the session's query collection when it is passed to the session's constructor.
+
=== Interface ===
  
      r) Removing the JPAQuery construct and directly adding named JPQL queries in the same way named SQL queries are processed. All will then be stored on the project, and then later added to the session during deploy
+
<source lang="java">
 +
public interface ProjectCache {
 +
public Project retrieveProject(Properties properties, Classloader loader);
 +
  public void storeProject(Project project, Properties properties);
 +
}
 +
</source>
  
==== with deserialization and initialization  ====
+
<br>
  
1) Project assumes it has a collection of queries when creating a session, but this is marked transient (an ongoing theme)
+
=== Included Implementation  ===
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; r) remove the transient marker. (consequences?) Attributes holding user objects will remain transient
+
This feature will include a ProjectCache implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache.java-serialization.file" property being defined.
  
2) deploy calls convertClassNamesToClasses on the project
+
==== Changes required for Java Serialization  ====
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2a) results in NPE since most queries are transient (queries held in DescriptorQueryManager are almost entirely transient)
+
Many settings built from the metadata are stored in the session and would be lost when serializing the project without changes to store them or reprocessing the metatadata
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2a r) Not call convertClassNamesToClasses on serialized project since the classloader is likely going to be the application loader anyway.&nbsp;
+
#JPAQueries. These will need to be stored in the project instead of the session. In addition to only JPQL queries being stored as JPAQueries, all named queries will need to be put into this collection and have processing delayed. The JPAQuery class will be changed to handle native SQL, stored function and PLSQL query processing.
 +
#A collection of Strings representing names of classes to be weaved will need to be stored within the project. Metadata is used to gather the classes that are needed for weaving, but not all classes required to be weaved will have descriptor representations. Without this, weaving could not occur without reprocessing the metadata.
 +
#The org.eclipse.persistence.sessions.Project references many classes that will need to be made serializable.
 +
#org.eclipse.persistence.sessions.Project and referenced classes have many variables set through metadata configuration that are transient and will need to be serialized.
 +
#*many remaining transients set through the constructors will need to be lazy initialized in accessors or null checks added where they are used.
 +
#ClassDescriptor will store a list of DescriptorCustomizers string names. These will be processed after serialization instead of immediately as user methods could add class dependencies that would interfer with weaving.
 +
#DescriptorEventManager will store a list of SerializableDescriptorEventHolder. Each SerializableDescriptorEventHolder will contain the raw data needed to build a single DescriptorEventListener that would have been set by metadata processing. This will be used in EntityManagerSetupImpl when processing the deserialized projects to create the appropriate EventListener instances.
 +
#*User classes and methods stored within JPA EntityListeners are not serializable and cannot be handled directly within DescriptorEventManager without adding jpa dependencies.
 +
#DatasourceCall will define a readObject method when deserializing. This method will correct parameterTypes collection so that the current static Integer values are used. Default deserialization causes new instances to be used breaking == equality used internally.
 +
#org.eclipse.persistence.queries.ConstructorResult will maintain a String targetClassName in addition to the transient targetClass. The targetClass will then get set during the convertClassNamesToClasses process
 +
#StructureConverter names will also need to be stored in the project, and then moved to the DatabasePlatform instance in the tail end of deploy.  
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;r) (?)The loader will need to be looked at to make sure we use the correct one avoiding the need for this method call.
+
=== Testing  ===
  
As we do not serialize queries other than existance checks, I assume this is because they are not needed on projects used by remote sessions. Changing this will impact usage/performance of remote sessions&nbsp;which needs to be investigated more.
+
This requires:
  
3) Customizers are called as they are processed and not stored on the project/session (see processCustomizers on MetadataProcessor). This will require refactoring to allow a string representation to be added to the project, and call them after the project is serialized so that they are not called twice on the same project.
+
#Unit tests
=== Open Items ===
+
#*Adding unit tests to verify using the provided implementation can write and read back projects from JPA metadata. For the java serialization implemenation, this will be basic testing to ensure that the project correctly serializes out.
 +
#*unit tests that compare the writen out project to project writen writen out project will be needed to identify when settings are added that will prevent the use of projects created/stored with prior EclipseLink versions. This testing will only be used to identify EclipseLink backward compatibility issues with prior stored projects. There will be no requirement to prevent these compatibility issues, the goal is to make developers aware of the issues caused and to reduce them.
 +
#different test configurations that run against all tests, much like static weaving is currently tested:
 +
#*the agent/weaver reading metadata with the runtime using a cached project  
 +
#*the agent/weaver and runtime both using a cached project
 +
=== Open Issues ===
  
*how this should interact with extensibility and RCM refresh commands. A user might not wish to get the cached project when triggering that the metadata has refreshed, so it will need a way to be overriden, but once read in, others on the server might want to use the cached version. The timing of caching the project might be a factor with the current setup
+
#*This was missed because there were no test failures indicating a problem. A string storing the StructConverter implementation class name might be enough but needs looking into.
*if the project isn't built and cached before the RCM refresh command goes out.  
+
#This feature runs into the same multitenant issues as Metadatasource, such as requiring a different project for different tenants or when weaving.  
*Dynamic classes are currently built using the MetadataDescriptors, not the Project/Descriptor classes that will be cached. This will require changes to how dynamic entities are created to be supported and is left outside the scope of this feature.
+
#The contract for building DescriptorEventListeners needs to be laid out in the ProjectCache. It might be better to use a class instead of a list of values where order might be important.
*Serialization could be a problem if some nodes using it are using an EclipseLink version different than the project was initially serialized from.  Ie if one node in a cluster is patched while the others are in the process.
+
#Documentation will be needed to ensure users understand they may need a way to verify the version of EclipseLink used to store their projects works with the version using to load it, and how to handle deploying applications with changes to metadata that make the cached/stored project obsolete.\
* ClassExtractor/MethodExtractor have instances that might not be serializable.  Need to verify that this is working correctly once done.
+
#investigate adding an option to store the project while static weaving.
** Same potential problem with converter.addConversionValue().  The converter stores object instances which might not be serializable.
+
* structureConverters are added post login to the databasePlatform object.  Need a way to store them beforehand in the project or they will be lost.
+
* Queries are stored in 2 places in the session.  JPAQueries (named JPQL) are put into the session and processed during login.  Other native queries (SQL, stored proc etc) are immediately processed and put into the session.  Project itself already has a collection of queries which are added to the session's query collection when it is passed to the session's constructor.
+
* This currently isn't working with weaving as the project contains references to classfiles that cause the files to get loaded while attempting to weave.  The code is the same (EclipseLink goes through predeploy twice when weaving is involved), so the project cache will either need to be ignored when weaving or the project modified to not hold class instances and use strings instead.
+

Latest revision as of 11:04, 10 July 2012

EclipseLink Metadata Cache

Purpose

This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.

Requirements

  1. EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
  2. Project caching must allow weaving to occur
  3. It must be configurable to allow alternate implementations to cache the project differently
  4. The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.
  5. It not break any current functionality such as remote sessions.

Design

PersistenceUnitProperties

public static final String PROJECT_CACHE = "eclipselink.project-cache"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface

The "eclipselink.project-cache.<implementationShortName>" subset of properties will be used for implementation specific properties.

Interface

public interface ProjectCache {
 public Project retrieveProject(Properties properties, Classloader loader);
 public void storeProject(Project project, Properties properties);
}


Included Implementation

This feature will include a ProjectCache implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache.java-serialization.file" property being defined.

Changes required for Java Serialization

Many settings built from the metadata are stored in the session and would be lost when serializing the project without changes to store them or reprocessing the metatadata

  1. JPAQueries. These will need to be stored in the project instead of the session. In addition to only JPQL queries being stored as JPAQueries, all named queries will need to be put into this collection and have processing delayed. The JPAQuery class will be changed to handle native SQL, stored function and PLSQL query processing.
  2. A collection of Strings representing names of classes to be weaved will need to be stored within the project. Metadata is used to gather the classes that are needed for weaving, but not all classes required to be weaved will have descriptor representations. Without this, weaving could not occur without reprocessing the metadata.
  3. The org.eclipse.persistence.sessions.Project references many classes that will need to be made serializable.
  4. org.eclipse.persistence.sessions.Project and referenced classes have many variables set through metadata configuration that are transient and will need to be serialized.
    • many remaining transients set through the constructors will need to be lazy initialized in accessors or null checks added where they are used.
  5. ClassDescriptor will store a list of DescriptorCustomizers string names. These will be processed after serialization instead of immediately as user methods could add class dependencies that would interfer with weaving.
  6. DescriptorEventManager will store a list of SerializableDescriptorEventHolder. Each SerializableDescriptorEventHolder will contain the raw data needed to build a single DescriptorEventListener that would have been set by metadata processing. This will be used in EntityManagerSetupImpl when processing the deserialized projects to create the appropriate EventListener instances.
    • User classes and methods stored within JPA EntityListeners are not serializable and cannot be handled directly within DescriptorEventManager without adding jpa dependencies.
  7. DatasourceCall will define a readObject method when deserializing. This method will correct parameterTypes collection so that the current static Integer values are used. Default deserialization causes new instances to be used breaking == equality used internally.
  8. org.eclipse.persistence.queries.ConstructorResult will maintain a String targetClassName in addition to the transient targetClass. The targetClass will then get set during the convertClassNamesToClasses process
  9. StructureConverter names will also need to be stored in the project, and then moved to the DatabasePlatform instance in the tail end of deploy.

Testing

This requires:

  1. Unit tests
    • Adding unit tests to verify using the provided implementation can write and read back projects from JPA metadata. For the java serialization implemenation, this will be basic testing to ensure that the project correctly serializes out.
    • unit tests that compare the writen out project to project writen writen out project will be needed to identify when settings are added that will prevent the use of projects created/stored with prior EclipseLink versions. This testing will only be used to identify EclipseLink backward compatibility issues with prior stored projects. There will be no requirement to prevent these compatibility issues, the goal is to make developers aware of the issues caused and to reduce them.
  2. different test configurations that run against all tests, much like static weaving is currently tested:
    • the agent/weaver reading metadata with the runtime using a cached project
    • the agent/weaver and runtime both using a cached project

Open Issues

    • This was missed because there were no test failures indicating a problem. A string storing the StructConverter implementation class name might be enough but needs looking into.
  1. This feature runs into the same multitenant issues as Metadatasource, such as requiring a different project for different tenants or when weaving.
  2. The contract for building DescriptorEventListeners needs to be laid out in the ProjectCache. It might be better to use a class instead of a list of values where order might be important.
  3. Documentation will be needed to ensure users understand they may need a way to verify the version of EclipseLink used to store their projects works with the version using to load it, and how to handle deploying applications with changes to metadata that make the cached/stored project obsolete.\
  4. investigate adding an option to store the project while static weaving.