Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "EclipseLink/DesignDocs/371950"

m
 
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
=EclipseLink Metadata Cache=
+
= EclipseLink Metadata Cache =
 +
 
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
{{warning|Work in progress|This analysis is in progress.}}
 
  
==Purpose==
+
== Purpose ==
This feature is to look at caching the metadata project so that the setup can avoid costs associated
+
 
with reading in multiple orm.xml and annotation processing on entities within a
+
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
persistence unit to rebuild it unnecessarily.
+
 
 +
== Requirements  ==
 +
 
 +
#EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
 +
#Project caching must allow weaving to occur
 +
#It must be configurable to allow alternate implementations to cache the project differently
 +
#The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.
 +
#It not break any current functionality such as remote sessions.
 +
 
 +
== Design  ==
 +
 
 +
=== PersistenceUnitProperties  ===
 +
 
 +
public static final String PROJECT_CACHE = "eclipselink.project-cache"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface
 +
 
 +
The "eclipselink.project-cache.<implementationShortName>" subset of properties will be used for implementation specific properties.
 +
 
 +
=== Interface  ===
 +
 
 +
<source lang="java">
 +
public interface ProjectCache {
 +
public Project retrieveProject(Properties properties, Classloader loader);
 +
public void storeProject(Project project, Properties properties);
 +
}
 +
</source>
 +
 
 +
<br>
 +
 
 +
=== Included Implementation  ===
 +
 
 +
This feature will include a ProjectCache implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache.java-serialization.file" property being defined.
 +
 
 +
==== Changes required for Java Serialization  ====
  
==Preliminary==
+
Many settings built from the metadata are stored in the session and would be lost when serializing the project without changes to store them or reprocessing the metatadata
The persistence unit in eclipselink-annotation-model.jar from JPA testing was chosen for investigation as it is the largest catch all unit in testing.
+
Gathering acurate numbers to determine the costs and benifits are difficult as serialization is not posssible, and it is not yet understood what can be shared.  The org.eclipse.persistence.sessions.Project object is going to be used as a starting point as it is the underlying object that gets built from metadata processing and should contain all mapping information for a session/EntityManagerFactory.  Caching this object should be sufficient to prevent the need to reprocess the entire persistence unit as would be done from scratch.
+
  
The project cannot be serialized as is, and the process of serializing to a file would depend entirely on file io. Initial numbers gathered indicate that creating a session from an existing project into the SessionManager, and then building an EntityManagerFactory/EntityManager from it takes 1/10 the time as building the initial persistence unit. This number is incorrect though, as the test had to build the project by accessing the default persistence unit, thereby causing the agent to load and much of the static initialization to be done. Comparing the time to load a default persistence unit to a subsequent unit within the same persitence.xml, the subsequent pu took 1/3 the time.  So 2/3 could have been due to costs that might not be able to be avoided through metadata caching - further testing is required.
+
#JPAQueries. These will need to be stored in the project instead of the session. In addition to only JPQL queries being stored as JPAQueries, all named queries will need to be put into this collection and have processing delayed. The JPAQuery class will be changed to handle native SQL, stored function and PLSQL query processing.
 +
#A collection of Strings representing names of classes to be weaved will need to be stored within the project. Metadata is used to gather the classes that are needed for weaving, but not all classes required to be weaved will have descriptor representations. Without this, weaving could not occur without reprocessing the metadata.
 +
#The org.eclipse.persistence.sessions.Project references many classes that will need to be made serializable.
 +
#org.eclipse.persistence.sessions.Project and referenced classes have many variables set through metadata configuration that are transient and will need to be serialized.
 +
#*many remaining transients set through the constructors will need to be lazy initialized in accessors or null checks added where they are used.
 +
#ClassDescriptor will store a list of DescriptorCustomizers string names. These will be processed after serialization instead of immediately as user methods could add class dependencies that would interfer with weaving.
 +
#DescriptorEventManager will store a list of SerializableDescriptorEventHolder. Each SerializableDescriptorEventHolder will contain the raw data needed to build a single DescriptorEventListener that would have been set by metadata processing. This will be used in EntityManagerSetupImpl when processing the deserialized projects to create the appropriate EventListener instances.
 +
#*User classes and methods stored within JPA EntityListeners are not serializable and cannot be handled directly within DescriptorEventManager without adding jpa dependencies.
 +
#DatasourceCall will define a readObject method when deserializing. This method will correct parameterTypes collection so that the current static Integer values are used. Default deserialization causes new instances to be used breaking == equality used internally.
 +
#org.eclipse.persistence.queries.ConstructorResult will maintain a String targetClassName in addition to the transient targetClass. The targetClass will then get set during the convertClassNamesToClasses process
 +
#StructureConverter names will also need to be stored in the project, and then moved to the DatabasePlatform instance in the tail end of deploy.  
  
The next step is to modify the org.eclipse.persistence.sessions.Project and its references so that it can be reliably seralized and reused when serialized.
+
=== Testing  ===
  
Problems and resolutions:
+
This requires:  
1) a few classes are not serializable. 
+
  r) add serializable interface to them when encountered.  Project is now serializable
+
  
with deserialization and initialization:
+
#Unit tests
1) Project assumes it has a collection of queries when creating a session, but this is marked transient (an ongoing theme)
+
#*Adding unit tests to verify using the provided implementation can write and read back projects from JPA metadata. For the java serialization implemenation, this will be basic testing to ensure that the project correctly serializes out.  
  r) remove the transient marker. (consequences?)
+
#*unit tests that compare the writen out project to project writen writen out project will be needed to identify when settings are added that will prevent the use of projects created/stored with prior EclipseLink versions. This testing will only be used to identify EclipseLink backward compatibility issues with prior stored projects. There will be no requirement to prevent these compatibility issues, the goal is to make developers aware of the issues caused and to reduce them.
2) deploy calls convertClassNamesToClasses on the project
+
#different test configurations that run against all tests, much like static weaving is currently tested:
      2a)- results in NPE since most queries are transient (queries held in DescriptorQueryManager are almost entirely transient)
+
#*the agent/weaver reading metadata with the runtime using a cached project
        2a r) Not call convertClassNamesToClasses on serialized project since the classloader is likely going to be the application loader anyway.
+
#*the agent/weaver and runtime both using a cached project
  r) (?)The loader will need to be looked at to make sure we use the correct one somehow.   
+
=== Open Issues ===
  
As we do not serialize queries other than existance checks, I assume this is because they are not needed on remote projects. Changing this will impact usage/performance of remote queries, so an alternative might be needed.
+
#*This was missed because there were no test failures indicating a problem. A string storing the StructConverter implementation class name might be enough but needs looking into.
 +
#This feature runs into the same multitenant issues as Metadatasource, such as requiring a different project for different tenants or when weaving.
 +
#The contract for building DescriptorEventListeners needs to be laid out in the ProjectCache. It might be better to use a class instead of a list of values where order might be important.
 +
#Documentation will be needed to ensure users understand they may need a way to verify the version of EclipseLink used to store their projects works with the version using to load it, and how to handle deploying applications with changes to metadata that make the cached/stored project obsolete.\
 +
#investigate adding an option to store the project while static weaving.

Latest revision as of 11:04, 10 July 2012

EclipseLink Metadata Cache

Purpose

This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.

Requirements

  1. EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
  2. Project caching must allow weaving to occur
  3. It must be configurable to allow alternate implementations to cache the project differently
  4. The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.
  5. It not break any current functionality such as remote sessions.

Design

PersistenceUnitProperties

public static final String PROJECT_CACHE = "eclipselink.project-cache"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface

The "eclipselink.project-cache.<implementationShortName>" subset of properties will be used for implementation specific properties.

Interface

public interface ProjectCache {
 public Project retrieveProject(Properties properties, Classloader loader);
 public void storeProject(Project project, Properties properties);
}


Included Implementation

This feature will include a ProjectCache implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache.java-serialization.file" property being defined.

Changes required for Java Serialization

Many settings built from the metadata are stored in the session and would be lost when serializing the project without changes to store them or reprocessing the metatadata

  1. JPAQueries. These will need to be stored in the project instead of the session. In addition to only JPQL queries being stored as JPAQueries, all named queries will need to be put into this collection and have processing delayed. The JPAQuery class will be changed to handle native SQL, stored function and PLSQL query processing.
  2. A collection of Strings representing names of classes to be weaved will need to be stored within the project. Metadata is used to gather the classes that are needed for weaving, but not all classes required to be weaved will have descriptor representations. Without this, weaving could not occur without reprocessing the metadata.
  3. The org.eclipse.persistence.sessions.Project references many classes that will need to be made serializable.
  4. org.eclipse.persistence.sessions.Project and referenced classes have many variables set through metadata configuration that are transient and will need to be serialized.
    • many remaining transients set through the constructors will need to be lazy initialized in accessors or null checks added where they are used.
  5. ClassDescriptor will store a list of DescriptorCustomizers string names. These will be processed after serialization instead of immediately as user methods could add class dependencies that would interfer with weaving.
  6. DescriptorEventManager will store a list of SerializableDescriptorEventHolder. Each SerializableDescriptorEventHolder will contain the raw data needed to build a single DescriptorEventListener that would have been set by metadata processing. This will be used in EntityManagerSetupImpl when processing the deserialized projects to create the appropriate EventListener instances.
    • User classes and methods stored within JPA EntityListeners are not serializable and cannot be handled directly within DescriptorEventManager without adding jpa dependencies.
  7. DatasourceCall will define a readObject method when deserializing. This method will correct parameterTypes collection so that the current static Integer values are used. Default deserialization causes new instances to be used breaking == equality used internally.
  8. org.eclipse.persistence.queries.ConstructorResult will maintain a String targetClassName in addition to the transient targetClass. The targetClass will then get set during the convertClassNamesToClasses process
  9. StructureConverter names will also need to be stored in the project, and then moved to the DatabasePlatform instance in the tail end of deploy.

Testing

This requires:

  1. Unit tests
    • Adding unit tests to verify using the provided implementation can write and read back projects from JPA metadata. For the java serialization implemenation, this will be basic testing to ensure that the project correctly serializes out.
    • unit tests that compare the writen out project to project writen writen out project will be needed to identify when settings are added that will prevent the use of projects created/stored with prior EclipseLink versions. This testing will only be used to identify EclipseLink backward compatibility issues with prior stored projects. There will be no requirement to prevent these compatibility issues, the goal is to make developers aware of the issues caused and to reduce them.
  2. different test configurations that run against all tests, much like static weaving is currently tested:
    • the agent/weaver reading metadata with the runtime using a cached project
    • the agent/weaver and runtime both using a cached project

Open Issues

    • This was missed because there were no test failures indicating a problem. A string storing the StructConverter implementation class name might be enough but needs looking into.
  1. This feature runs into the same multitenant issues as Metadatasource, such as requiring a different project for different tenants or when weaving.
  2. The contract for building DescriptorEventListeners needs to be laid out in the ProjectCache. It might be better to use a class instead of a list of values where order might be important.
  3. Documentation will be needed to ensure users understand they may need a way to verify the version of EclipseLink used to store their projects works with the version using to load it, and how to handle deploying applications with changes to metadata that make the cached/stored project obsolete.\
  4. investigate adding an option to store the project while static weaving.

Back to the top