Talk:EclipseLink/UserGuide/JPA/Advanced JPA Development/Data Partitioning
E-mail exchange between Ben Gelernter and James Sutherland, pasted here for convenience (2/28/11):
Answer inline below,
From: Ben Gelernter Sent: Tuesday, February 22, 2011 3:13 PM To: James Sutherland Cc: Douglas Clarke; Peter Krogh; Rick Sapir; Donna Micozzi Subject: questions about data partitioning
Thanks for your comments on the EclipseLink Data Partitioning topic (http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Data_Partitioning). I have made some of your changes, but I still have some further questions, below.
(For convenience, here's a link to the spec: http://wiki.eclipse.org/EclipseLink/DesignDocs/328937.)
Is the following statement correct?
"Configure data partitioning on an entity, a relationship, a query, a session unit, or a persistence unit by using the @Partitioned annotation and one or more partitioning policy annotations." [James Sutherland] [James Sutherland] - @Partitioned can only be annotated on an Entity or relationship. For a query the "eclipselink.partitioned" query hint is used, for a persistence unit the "eclipselink.partitioned" persistence unit property is used
Should the above statement be altered or expanded to make it clearer or to add more useful informaton?
What, if anything, should be said about PartitioningPolicy? Will a developer use this directly? How? (See also question 7, below.) [James Sutherland] [James Sutherland] PartitioningPolicy is the native API, I assume we are not documenting the native API, unless annotations do not exist. So just an API link to the class should be sufficient. Some explaintion will be required for the custom partitioning policy to explain how the subclass the native API class to define your own policy (this is advanced usage however).
I fixed all the links the partitioning policies to point to the annotations. Is this the correct list?
@ValuePartitioning [James Sutherland] [James Sutherland] Yes
What about a Field partitioning policy? I see Javadoc for Class FieldPartitioningPolicy, but I don't see a @FieldPartitioning annotation in the Javadoc. [James Sutherland] This is an abstract class, do not document it.
Regarding support for affinity and Oracle RAC:
In your comments, you said to remove mention of UCP support but to retain Oracle RAC (and Clustered Databases).
What should be said about Oracle RAC and clusterd databases, then? The only thing the spec says about RAC is in the context of UCP. ("Oracle RAC is supported through the Oracle JDBC Universal Connection Pool (UCP). UCP supports a single DataSource into the RAC..") [James Sutherland] [James Sutherland] This release has no specific UCP or RAC support, but could be used with RAC or any clustered database. The key items is that i can be used with RAC/clustered database, when using a clustered database replication and unions are not required, a different connection pool should be made to each node in the cluster. Partitioning the data in the cluster will improve performance/scalablity by not requiring the nodes to ping the current data from other nodes.
The same section in the spec ("RAC/UCP") says "A generic DataPartitioningCallback interface is defined in EclipseLink (platform.database.partitioning) to support integration with an external DataSource data affinity."
I couldn't find DataPartitioningCallback in the Javadocs. For that matter, I couldn't find a platform.database.partitioning package. Am I looking for org.eclipse.persistence.platform.database.partitioning?
Does this need to be documented? [James Sutherland] [James Sutherland] This is not in 2.2.
In short, I'm not really sure what to say about affinity, RAC, and clusterd databases. Should we be providing examples of how to configure for this situation? Can you provide me with such an example or examples?
One of your comments says the Key API box should list the "JPA API first, then the native API" below. Would you kindly tell me which APIs should be listed in each section? [James Sutherland] [James Sutherland] The JPA API is the annotations, the native API are the PartitioningPolicy classes. Every JPA feature will have both JPA annotations and native API, so we must make this consistent.
Regarding documenting XML:
As I mentioned in last week's meeting, we plan to document XML throughout the doc, as time permits. When I said I'd remove the existing XML doc until we can do a complete pass, Peter said to keep what is already there. So, since this is a new feature, if I can get this XML documented correctly, I'd like to do that. [6.b]
The sample in the spec uses <partitioning-policy>, but I don't see that in eclipselink_orm_2.2.xsd. I see a <partitioning> element used to specify a PartitioningPolicy. That sounds like the @Partitioned annotation. Are they used in the same way? That is, do you use <partitioning> to specify the policy and then use one (or more?) partitioning policy elements to configure the policy (that is, if I got that right in #1, above)? [James Sutherland] [James Sutherland] Yes, <partitioning-policy> was renamed to <partitioning>, <partitioned> is the same as @Partitioned, <xxx-partitioning> is the same as @XXXPartitioning
(Note: I didn't find a <partitioned> documented in the XSD, but <partitioned> is listed under <xsd:group name="partitioning-group">, along with all the partitioning policies.)
Is this the correct list of partitioning policy elements?
If so, I don't understand the elements in the example given in the spec, which uses <partitioning-policy>, <round-robin-policy>, <random-policy>, <replication-policy>, or <range-partitioning-policy>. I found none of these in the XSD.
You made the comment:
"example should say what it is an example of, what is it doing, and include a persistence.xml defining the pools used in the example'
Can you please tell me what should be called out about about these code examples? Do you think it is enough to say something like the following?
"The following example shows a partitioning policy applied to an entity. The @Partitioned annotation specifies which paritioning policy to use, and the @RangePartitioning annotation configures the range partitioning policy." [James Sutherland] [James Sutherland] the @RangePartitioning annotation configures the instances of the Project Entity to be partitioned across the three different database instances based on the project's Id. Each database instance will store a different range of the project instances.
If you think it needs to changed or expanded, please advise.
Can you please provide me with the sample persistence.xml for use with these examples per your suggestion? [James Sutherland] [James Sutherland] The persistence.xml will need to define each of the connection pools using the new connection pool persistence unit properties.
You made the comment, "API/example for how to write a custom partitioning policy should be given."
I see a CustomPartitioningPolicy in the Javadoc, but the spec says "The user will be able to subclass the PartitionPolicy to provide there own partitioning mechanism.' So I'm not sure which path to pursue. Can you advise me, please? Can you provide me with an example? I'll add a section, "Defining a Custom Partitioning Policy," add some explanation (per your instructions), and add the example. [James Sutherland] [James Sutherland] The @Partitioning annotation takes a class which is a subclass of PartitioningPolicy, it must override the getConnections API.
I suspect I'm missing important information about connection pools. Can you point me to anything (Javadoc, spec, XSD, whatever) to give me some direction? Or can you advise me directly about what needs to be said about connection pools here? [James Sutherland] [James Sutherland] Partitioning requires that a named connection pool be defined for each node. The connection pool persistence unit properties are used for this.
P.S., I still don't quite have the hang of how you and the EL team like to work. I'm sending e-mail this time (instead of just adding to the Discussion), because I think it might be a little easier to handle specific questions this way. I'm also copying Peter and Doug (and my manager), in case they want to see the discussion. Should I just copy the whole Dev mailing list? It seems that would be a bit of overkill, for asking specific questions.
- Doc should (**for everything) be in the context of JPA an JPA config (annotations) not the native API (unless no annotations exist).
- annotations for partitioning are in the annotations package
- the "Key API" box should list the JPA API first, then the native API below, ideally in a separate box, to separated
- there is no ucp support in 2.2, remove that section
- should have a section on Oracle RAC and Clustered Databases though, should mention that replication is not required, and a data-source is required per node
- not sure what the configuration files is suppose to be documenting, we must either document xml for everything, or only doc annotations
- the doc as it is makes no sense
- ideally we would doc both, or at least mention the annotation and xml elements/attributes, would be nice to have an example of both annotations and xml
- named connection pools should be documented separately and referenced from here
- why is deprecated API listed here??? this should be only in the release notes
- example should say what it is an example of, what is it doing, and include a persistence.xml defining the pools used in the example
- API/example for how to write a custom partitioning policy should be given
- James.sutherland.oracle.com 16:48, 16 February 2011 (UTC)
Questions about this documentation (by Ben Gelernter, tech writer):
The partitioning policies in the spec at http://wiki.eclipse.org/EclipseLink/DesignDocs/328937 don't seem to match the partitioning policies in the Javadoc (from the zip download) at zip/eclipselink-2.2.0.v20110114-r8831/eclipselink/eclipselink-javadocs/org/eclipse/persistence/descriptors/partitioning/package-summary.html.
This draft (1/26/11) is based on the Javadoc. Is that correct? That is, this doc should follow the Javadoc, not the spec on this matter, correct?
The differences are:
- CustomPartitioningPolicy and PinnedPartitioningPolicy are in the Javadoc, but not in the spec.
- There are alternative names given for the policies in the spec, under the Functionality section. Shall I assume the names in the Javadoc are correct and the alternative names in the spec should be ignored? Or are they referring to some other kind of entity?
The following table sums up the differences:
|Spec under "Functionality"||Spec under "API"||Javadoc|
|not listed||not listed||CustomPartitioningPolicy|
|not listed||FieldPartitioningPolicy||same as API in spec|
|HashPartitionPolicy||HashPartitioningPolicy||same as API in spec|
|not listed||PartitioningPolicy||same as API in spec|
|RoundRobinPolicy||RoundRobinPartitioningPolicy||same as API in spec|
|RangePartitionPolicy||RangePartitioningPolicy||same as API in spec|
|ReplicationPolicy||ReplicationPartitioningPolicy||same as API in spec|
|UnionPartitionPolicy||UnionPartitioningPolicy||same as API in spec|
|ValuePartitionPolicy||ValuePartitioningPolicy||same as API in spec|
As of build v20100819-r8063, I still don't see any of these APIs in the online Javadoc at http://www.eclipse.org/eclipselink/api/2.2/index.html. Is there a reason for that? When will it appear?
The spec gives an example of an orm.xml file but only has a bullet item for persistence.xml. Should there be an example for persistence.xml, too?
Under "Requirements," the spec lists failover, UCP and RAC as "Phase 2." So, should the "Data Affinity, Oracle RAC, and JDBC UCP Support" section in the current draft be removed?
And nothing needs to be said about failover for this release, correct?
I'm not sure what specifically to say about the listed connection pool properties in this context. Can anyone offer some choice phrases for an introduction?