Difference between revisions of "EclipseLink/DesignDocs/371950"

From Eclipsepedia

Jump to: navigation, search
m
m
 
(21 intermediate revisions by one user not shown)
Line 1: Line 1:
= EclipseLink Metadata Cache =
+
= EclipseLink Metadata Cache =
  
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
 
*[http://bugs.eclipse.org/371950 Enhancement Request 371950]
  
{{warning|Work in progress|This analysis is in progress.}}
+
== Purpose ==
 
+
== Purpose ==
+
  
 
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
 
This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.  
  
== Preliminary ==
+
== Requirements  ==
 +
 
 +
#EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
 +
#Project caching must allow weaving to occur
 +
#It must be configurable to allow alternate implementations to cache the project differently
 +
#The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.
 +
#It not break any current functionality such as remote sessions.
 +
 
 +
== Design  ==
 +
 
 +
=== PersistenceUnitProperties  ===
  
The persistence unit in eclipselink-annotation-model.jar from JPA testing was chosen for investigation as it is the largest catch all unit in testing. Gathering acurate numbers to determine the costs and benifits are difficult as serialization is not posssible, and it is not yet understood what can be shared. The org.eclipse.persistence.sessions.Project object is going to be used as a starting point as it is the underlying object that gets built from metadata processing and should contain all mapping information for a session/EntityManagerFactory. Caching this object should be sufficient to prevent the need to reprocess the entire persistence unit as would be done from scratch.
+
public static final String PROJECT_CACHE = "eclipselink.project-cache"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface
  
The project cannot be serialized as is, and the process of serializing to a file would depend entirely on file io. Initial numbers gathered indicate that creating a session from an existing project into the SessionManager, and then building an EntityManagerFactory/EntityManager from it takes 1/10 the time as building the initial persistence unit. This number is incorrect though, as the test had to build the project by accessing the default persistence unit, thereby causing the agent to load and much of the static initialization to be done. Comparing the time to load a default persistence unit to a subsequent unit within the same persitence.xml, the subsequent pu took 1/3 the time. So 2/3 could have been due to costs that might not be able to be avoided through metadata caching - further testing is required.  
+
The "eclipselink.project-cache.<implementationShortName>" subset of properties will be used for implementation specific properties.  
  
The next step is to modify the org.eclipse.persistence.sessions.Project and its references so that it can be reliably seralized and reused when serialized.
+
=== Interface  ===
  
=== Problems and resolutions ===
+
<source lang="java">
 +
public interface ProjectCache {
 +
public Project retrieveProject(Properties properties, Classloader loader);
 +
public void storeProject(Project project, Properties properties);
 +
}
 +
</source>
  
1) a few classes are not serializable.
+
<br>
  
&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;r) add serializable interface to them when encountered. Project is now serializable
+
=== Included Implementation  ===
  
==== with deserialization and initialization ====
+
This feature will include a ProjectCache implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache.java-serialization.file" property being defined.
  
1) Project assumes it has a collection of queries when creating a session, but this is marked transient (an ongoing theme)
+
==== Changes required for Java Serialization  ====
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; r) remove the transient marker. (consequences?)
+
Many settings built from the metadata are stored in the session and would be lost when serializing the project without changes to store them or reprocessing the metatadata
  
2 deploy calls convertClassNamesToClasses on the project
+
#JPAQueries. These will need to be stored in the project instead of the session. In addition to only JPQL queries being stored as JPAQueries, all named queries will need to be put into this collection and have processing delayed. The JPAQuery class will be changed to handle native SQL, stored function and PLSQL query processing.
 +
#A collection of Strings representing names of classes to be weaved will need to be stored within the project. Metadata is used to gather the classes that are needed for weaving, but not all classes required to be weaved will have descriptor representations. Without this, weaving could not occur without reprocessing the metadata.
 +
#The org.eclipse.persistence.sessions.Project references many classes that will need to be made serializable.
 +
#org.eclipse.persistence.sessions.Project and referenced classes have many variables set through metadata configuration that are transient and will need to be serialized.
 +
#*many remaining transients set through the constructors will need to be lazy initialized in accessors or null checks added where they are used.
 +
#ClassDescriptor will store a list of DescriptorCustomizers string names. These will be processed after serialization instead of immediately as user methods could add class dependencies that would interfer with weaving.
 +
#DescriptorEventManager will store a list of SerializableDescriptorEventHolder. Each SerializableDescriptorEventHolder will contain the raw data needed to build a single DescriptorEventListener that would have been set by metadata processing. This will be used in EntityManagerSetupImpl when processing the deserialized projects to create the appropriate EventListener instances.
 +
#*User classes and methods stored within JPA EntityListeners are not serializable and cannot be handled directly within DescriptorEventManager without adding jpa dependencies.
 +
#DatasourceCall will define a readObject method when deserializing. This method will correct parameterTypes collection so that the current static Integer values are used. Default deserialization causes new instances to be used breaking == equality used internally.
 +
#org.eclipse.persistence.queries.ConstructorResult will maintain a String targetClassName in addition to the transient targetClass. The targetClass will then get set during the convertClassNamesToClasses process
 +
#StructureConverter names will also need to be stored in the project, and then moved to the DatabasePlatform instance in the tail end of deploy.
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2a) results in NPE since most queries are transient (queries held in DescriptorQueryManager are almost entirely transient)
+
=== Testing  ===
  
&nbsp;&nbsp;
+
This requires:
2a r) Not call convertClassNamesToClasses on serialized project since the classloader is likely going to be the application loader anyway.&nbsp;
+
  
&nbsp;&nbsp; &nbsp;
+
#Unit tests
r) (?)The loader will need to be looked at to make sure we use the correct one avoiding the need for this method call.
+
#*Adding unit tests to verify using the provided implementation can write and read back projects from JPA metadata. For the java serialization implemenation, this will be basic testing to ensure that the project correctly serializes out.
 +
#*unit tests that compare the writen out project to project writen writen out project will be needed to identify when settings are added that will prevent the use of projects created/stored with prior EclipseLink versions. This testing will only be used to identify EclipseLink backward compatibility issues with prior stored projects. There will be no requirement to prevent these compatibility issues, the goal is to make developers aware of the issues caused and to reduce them.
 +
#different test configurations that run against all tests, much like static weaving is currently tested:
 +
#*the agent/weaver reading metadata with the runtime using a cached project
 +
#*the agent/weaver and runtime both using a cached project
 +
=== Open Issues  ===
  
As we do not serialize queries other than existance checks, I assume this is because they are not needed on projects used by remote sessions. Changing this will impact usage/performance of remote sessions&nbsp;which needs to be investigated more.
+
#*This was missed because there were no test failures indicating a problem. A string storing the StructConverter implementation class name might be enough but needs looking into.
 +
#This feature runs into the same multitenant issues as Metadatasource, such as requiring a different project for different tenants or when weaving.
 +
#The contract for building DescriptorEventListeners needs to be laid out in the ProjectCache. It might be better to use a class instead of a list of values where order might be important.
 +
#Documentation will be needed to ensure users understand they may need a way to verify the version of EclipseLink used to store their projects works with the version using to load it, and how to handle deploying applications with changes to metadata that make the cached/stored project obsolete.\
 +
#investigate adding an option to store the project while static weaving.

Latest revision as of 11:04, 10 July 2012

Contents

[edit] EclipseLink Metadata Cache

[edit] Purpose

This feature is to look at caching the metadata project so that the setup can avoid costs associated with reading in multiple orm.xml and annotation processing on entities within a persistence unit to rebuild it unnecessarily.

[edit] Requirements

  1. EntityManagerFactory and EntityManager instance creation use properties that allow overriding metadata settings/properties the same as it would if project caching was not used
  2. Project caching must allow weaving to occur
  3. It must be configurable to allow alternate implementations to cache the project differently
  4. The project will be read from/writen to the cache prior to login, prior to converting String ClassNames into Classes.
  5. It not break any current functionality such as remote sessions.

[edit] Design

[edit] PersistenceUnitProperties

public static final String PROJECT_CACHE = "eclipselink.project-cache"; will be the base property to configure this feature, and will take a string value representing shipped implementations or the a <package.class> name of a subclass implementation of the ProjectCache interface

The "eclipselink.project-cache.<implementationShortName>" subset of properties will be used for implementation specific properties.

[edit] Interface

public interface ProjectCache {
 public Project retrieveProject(Properties properties, Classloader loader);
 public void storeProject(Project project, Properties properties);
}


[edit] Included Implementation

This feature will include a ProjectCache implementation that uses java serialization to read to/write from a file which can be used by specifying the PROJECT_CACHE property with a value of "java-serialization" This implementation will also require the file location be specified, and will rely on a "eclipselink.project-cache.java-serialization.file" property being defined.

[edit] Changes required for Java Serialization

Many settings built from the metadata are stored in the session and would be lost when serializing the project without changes to store them or reprocessing the metatadata

  1. JPAQueries. These will need to be stored in the project instead of the session. In addition to only JPQL queries being stored as JPAQueries, all named queries will need to be put into this collection and have processing delayed. The JPAQuery class will be changed to handle native SQL, stored function and PLSQL query processing.
  2. A collection of Strings representing names of classes to be weaved will need to be stored within the project. Metadata is used to gather the classes that are needed for weaving, but not all classes required to be weaved will have descriptor representations. Without this, weaving could not occur without reprocessing the metadata.
  3. The org.eclipse.persistence.sessions.Project references many classes that will need to be made serializable.
  4. org.eclipse.persistence.sessions.Project and referenced classes have many variables set through metadata configuration that are transient and will need to be serialized.
    • many remaining transients set through the constructors will need to be lazy initialized in accessors or null checks added where they are used.
  5. ClassDescriptor will store a list of DescriptorCustomizers string names. These will be processed after serialization instead of immediately as user methods could add class dependencies that would interfer with weaving.
  6. DescriptorEventManager will store a list of SerializableDescriptorEventHolder. Each SerializableDescriptorEventHolder will contain the raw data needed to build a single DescriptorEventListener that would have been set by metadata processing. This will be used in EntityManagerSetupImpl when processing the deserialized projects to create the appropriate EventListener instances.
    • User classes and methods stored within JPA EntityListeners are not serializable and cannot be handled directly within DescriptorEventManager without adding jpa dependencies.
  7. DatasourceCall will define a readObject method when deserializing. This method will correct parameterTypes collection so that the current static Integer values are used. Default deserialization causes new instances to be used breaking == equality used internally.
  8. org.eclipse.persistence.queries.ConstructorResult will maintain a String targetClassName in addition to the transient targetClass. The targetClass will then get set during the convertClassNamesToClasses process
  9. StructureConverter names will also need to be stored in the project, and then moved to the DatabasePlatform instance in the tail end of deploy.

[edit] Testing

This requires:

  1. Unit tests
    • Adding unit tests to verify using the provided implementation can write and read back projects from JPA metadata. For the java serialization implemenation, this will be basic testing to ensure that the project correctly serializes out.
    • unit tests that compare the writen out project to project writen writen out project will be needed to identify when settings are added that will prevent the use of projects created/stored with prior EclipseLink versions. This testing will only be used to identify EclipseLink backward compatibility issues with prior stored projects. There will be no requirement to prevent these compatibility issues, the goal is to make developers aware of the issues caused and to reduce them.
  2. different test configurations that run against all tests, much like static weaving is currently tested:
    • the agent/weaver reading metadata with the runtime using a cached project
    • the agent/weaver and runtime both using a cached project

[edit] Open Issues

    • This was missed because there were no test failures indicating a problem. A string storing the StructConverter implementation class name might be enough but needs looking into.
  1. This feature runs into the same multitenant issues as Metadatasource, such as requiring a different project for different tenants or when weaving.
  2. The contract for building DescriptorEventListeners needs to be laid out in the ProjectCache. It might be better to use a class instead of a list of values where order might be important.
  3. Documentation will be needed to ensure users understand they may need a way to verify the version of EclipseLink used to store their projects works with the version using to load it, and how to handle deploying applications with changes to metadata that make the cached/stored project obsolete.\
  4. investigate adding an option to store the project while static weaving.