Jump to: navigation, search

Difference between revisions of "EclipseLink/UserGuide/JPA/Advanced JPA Development/Data Partitioning"

(34 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
|api=y
 
|api=y
 
|apis=  
 
|apis=  
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/descriptors/AbstractSession.html AbstractSession]
+
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/Partitioned.html @Partitioned]
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/descriptors/ClassDescriptor.html ClassDescriptor]
+
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/HashPartitioning.html @HashPartitioning]
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/queries/DatabaseQuery.html DatabaseQuery]
+
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/PinnedPartitioning.html @PinnedPartitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/RangePartitioning.html @RangePartitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/ReplicationPartitioning.html @ReplicationPartitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/RoundRobinPartitioning.html @RoundRobinPartitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/ValuePartitioning.html @ValuePartitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/UnionPartitioning.html @UnionPartitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/Partitioning.html @Partitioning]
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/descriptors/partitioning/package-summary.html partitioning package]
 
}}
 
}}
  
 
=Data Partitioning=
 
=Data Partitioning=
  
.
+
Data partitioning allows for an application to scale its data across more than a single database machine. EclipseLink supports data partitioning at the Entity level to allow a different set of entity instances for the same class to be stored in a different physical database or different node within a database cluster.  Both regular databases, and clustered databases are supported.  Data can be partitioned both horizontally and vertically.
 +
 
 +
Partitioning can be enabled on an an entity, a relationship, a query, or a persistence unit.
 +
 
 +
==Partitioning Policies==
 +
 
 +
To configure data partitioning, use the [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/Partitioned.html <tt>@Partitioned</tt>] annotation and one or more partitioning policy annotations. The annotations for defining the different kinds of policies are:
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/HashPartitioning.html <tt>@HashPartitioning</tt>] - Partitions access to a database cluster by the hash of a field value from the object, such as the object's ID, location, or tenant. The hash indexes into the list of connection pools/nodes. All write or read request for objects with that hash value are sent to the same server. If a query does not include the hash field as a parameter, it can be sent to all servers and unioned, or it can be left to the session's default behavior.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/PinnedPartitioning.html <tt>@PinnedPartitioning</tt>] - Pins requests to a single connection pool/node.  This allows for vertical partitioning.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/RangePartitioning.html <tt>@RangePartitioning</tt>] - Partitions access to a database cluster by a field value from the object, such as the object's ID, location, or tenant. Each server is assigned a range of values. All write or read requests for objects with that value are sent to the same server. If a query does not include the field as a parameter, then it can either be sent to all server's and unioned, or left to the session's default behavior.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/ReplicationPartitioning.html <tt>@ReplicationPartitioning</tt>] - Sends requests to a set of connection pools/nodes. This policy is for replicating data across a cluster of database machines. Only modification queries are replicated.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/RoundRobinPartitioning.html <tt>@RoundRobinPartitioning</tt>] - Sends requests in a round-robin fashion to the set of connection pools/nodes. It is for load balancing read queries across a cluster of database machines. It requires that the full database be replicated on each machine, so it does not support partitioning. The data should either be read-only, or writes should be replicated.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/org/eclipse/persistence/annotations/UnionPartitioning.html <tt>@UnionPartitioning@</tt>] - Sends queries to all connection pools and unions the results. This is for queries or relationships that span partitions when partitioning is used, such as on a ManyToMany cross partition relationship.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/ValuePartitioning.html <tt>@ValuePartitioning</tt>] - Partitions access to a database cluster by a field value from the object, such as the object's location or tenant. Each value is assigned a specific server. All write or read requests for objects with that value are sent to the same server. If a query does not include the field as a parameter, then it can be sent to all servers and unioned, or it can be left to the session's default behavior.
 +
 
 +
* [http://www.eclipse.org/eclipselink/api/latest/org/eclipse/persistence/annotations/Partitioning.html <tt>@Partitioning</tt>] - Partitions access to a database cluster by a custom partitioning policy. A <code>PartitioningPolicy</code> class must be provided and implemented.
 +
 
 +
 
 +
Partitioning policies are globally-named objects in a persistence unit and are reusable across multiple descriptors or queries. This improves the usability of the configuration, specifically with JPA annotations and XML.
 +
 
 +
The persistence unit properties support adding named connection pools in addition to the existing configuration for read/write/sequence.  A named connection pool must be defined for each node in the database cluster.
 +
 
 +
If a transaction modifies data from multiple partitions, JTA should be used to ensure 2-phase commit of the data.  An exclusive connection can also be configured in the EntityManager to ensure only a single node is used for a single transaction.
 +
 
 +
==Clustered Databases and Oracle RAC==
 +
 
 +
Some databases support clustering the database across multiple machines.  Oracle RAC allows for a single database to span multiple different server nodes.
 +
Oracle RAC also supports table and node partitioning of data.  A database cluster allows for any of the data to be accessed from any node in the cluster.  However, it is generally it is more efficient to partition the data access to specific nodes, to reduce cross node communication.
 +
 
 +
EclipseLink partitioning can be used in conjunction with a clustered database to reduce cross node communication, and improve scalability.
 +
 
 +
To use partitioning with a database cluster to following is required:
 +
* Partition policy should not enable replication, as database cluster makes data available to all nodes.
 +
* Partition policy should not use unions, as database cluster returns the complete query result from any node.
 +
* A DataSource and EclipseLink connection pool should be defined for each node in the cluster.
 +
* The application's data access and data partitioning should be designed to have each transaction only require access to a single node.
 +
* Usage of an exclusive connection for an EntityManager is recommended to avoid having multiple nodes in a single transaction and avoid 2-phase commit.
 +
 
 +
== Data Partitioning Examples ==
 +
This example partitions the Employee data by location.  The two primary sites, Ottawa and Toronto are each stored on a separate database.  All other locations are stored on the default database.
 +
Project is range partitioned by its ID.  Each range of ID values are stored on a different database.
 +
The employee/project relationship is an example of a cross partition relationship.  To allow the employees and projects to be stored on different databases a union policy is used and the join table is replicated to each database.
 +
 
 +
<source lang="java">
 +
@Entity
 +
@IdClass(EmployeePK.class)
 +
@UnionPartitioning(
 +
        name="UnionPartitioningAllNodes",
 +
        replicateWrites=true)
 +
@ValuePartitioning(
 +
        name="ValuePartitioningByLOCATION",
 +
        partitionColumn=@Column(name="LOCATION"),
 +
        unionUnpartitionableQueries=true,
 +
        defaultConnectionPool="default",
 +
        partitions={
 +
            @ValuePartition(connectionPool="node2", value="Ottawa"),
 +
            @ValuePartition(connectionPool="node3", value="Toronto")
 +
        })
 +
@Partitioned("ValuePartitioningByLOCATION")
 +
public class Employee {
 +
    @Id
 +
    @Column(name = "EMP_ID")
 +
    private Integer id;
 +
 
 +
    @Id
 +
    private String location;
 +
    ...
 +
 
 +
    @ManyToMany(cascade = { PERSIST, MERGE })
 +
    @Partitioned("UnionPartitioningAllNodes")
 +
    private Collection<Project> projects;
 +
    ...
 +
}
 +
</source>
 +
 
 +
<source lang="java">
 +
@Entity
 +
@RangePartitioning(
 +
        name="RangePartitioningByPROJ_ID",
 +
        partitionColumn=@Column(name="PROJ_ID"),
 +
        partitionValueType=Integer.class,
 +
        unionUnpartitionableQueries=true,
 +
        partitions={
 +
            @RangePartition(connectionPool="default", startValue="0", endValue="1000"),
 +
            @RangePartition(connectionPool="node2", startValue="1000", endValue="2000"),
 +
            @RangePartition(connectionPool="node3", startValue="2000")
 +
        })
 +
@Partitioned("RangePartitioningByPROJ_ID")
 +
public class Project {
 +
    @Id
 +
    @Column(name="PROJ_ID")
 +
    private Integer id;
 +
    ...
 +
}
 +
</source>
 +
 
 +
<!--
 +
== Configuring Data Partitioning in eclipselink-orm.xml ==
 +
 
 +
Configure data partitioning in eclipselink-orm.xml, as shown in the following example.
 +
 
 +
<source lang="xml">
 +
<partitioning-policy class="org.acme.MyPolicy"/>
 +
<round-robin-policy replicate-writes="true">
 +
  <connection-pool>node1</connection-pool>
 +
  <connection-pool>node2</connection-pool>
 +
</round-robin-policy>
 +
<random-policy replicate-writes="true">
 +
  <connection-pool>node1</connection-pool>
 +
  <connection-pool>node2</connection-pool>
 +
</random-policy>
 +
<replication-policy>
 +
  <connection-pool>node1</connection-pool>
 +
  <connection-pool>node2</connection-pool>
 +
</replication-policy>
 +
<range-partitioning-policy parameter-name="id" exclusive-connection="true" union-unpartitionable-queries="true">
 +
  <range-partition connection-pool="node1" start-value="0" end-value="100000" value-type="java.lang.Integer"/>
 +
  <range-partition connection-pool="node2" start-value="100001" end-value="200000" value-type="java.lang.Integer"/>
 +
  <range-partition connection-pool="node3" start-value="200001" value-type="java.lang.Integer"/>
 +
</range-partitioning-policy>
 +
</source>
 +
 
 +
-->
 +
<!--
 +
==Connection Pool Properties==
 +
* "javax.persistence.jdbc.url"
 +
* "javax.persistence.nonJtaDataSource"
 +
* "javax.persistence.jtaDataSource"
 +
* "eclipselink.jdbc.connections.initial"
 +
* "eclipselink.jdbc.connections.min"
 +
* "eclipselink.jdbc.connections.max"
 +
* "eclipselink.jdbc.write-connections.initial"
 +
* "eclipselink.jdbc.write-connections.min"
 +
* "eclipselink.jdbc.write-connections.max"
 +
* "eclipselink.jdbc.read-connections.initial"
 +
* "eclipselink.jdbc.read-connections.min"
 +
* "eclipselink.jdbc.read-connections.max"
 +
* "eclipselink.jdbc.sequence-connection-pool.non-jta-data-source"
 +
* "eclipselink.jdbc.sequence-connection-pool.initial"
 +
* "eclipselink.jdbc.sequence-connection-pool.max"
 +
* "eclipselink.jdbc.sequence-connection-pool.min"
 +
 
 +
=== Named Connection Pool Properties ===
 +
* "eclipselink.connection-pool.<name>.initial"
 +
* "eclipselink.connection-pool.<name>.min"
 +
* "eclipselink.connection-pool.<name>.max"
 +
* "eclipselink.connection-pool.<name>.url"
 +
* "eclipselink.connection-pool.<name>.jtaDataSource"
 +
* "eclipselink.connection-pool.<name>.nonJtaDataSource"
 +
 
 +
-->
 +
 
 +
 
 
{{EclipseLink_JPA
 
{{EclipseLink_JPA
|previous =[[EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Schema_Generation/Index|@Index]]
+
|previous =[[EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Schema_Generation/Appending_strings_to_CREATE_TABLE_statements|Appending Strings to CREATE_TABLE Statements]]
 
|up      =[[EclipseLink/UserGuide/JPA/Advanced_JPA_Development|Advanced JPA Development]]
 
|up      =[[EclipseLink/UserGuide/JPA/Advanced_JPA_Development|Advanced JPA Development]]
|next =[[EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Performance|Performance]]
+
|next =[[EclipseLink/UserGuide/JPA/Advanced_JPA_Development/JPA RESTful Service|JPA RESTful Service]]
 
|version=2.2.0 DRAFT}}
 
|version=2.2.0 DRAFT}}

Revision as of 14:39, 3 May 2012

EclipseLink JPA


Data Partitioning

Data partitioning allows for an application to scale its data across more than a single database machine. EclipseLink supports data partitioning at the Entity level to allow a different set of entity instances for the same class to be stored in a different physical database or different node within a database cluster. Both regular databases, and clustered databases are supported. Data can be partitioned both horizontally and vertically.

Partitioning can be enabled on an an entity, a relationship, a query, or a persistence unit.

Partitioning Policies

To configure data partitioning, use the @Partitioned annotation and one or more partitioning policy annotations. The annotations for defining the different kinds of policies are:

  • @HashPartitioning - Partitions access to a database cluster by the hash of a field value from the object, such as the object's ID, location, or tenant. The hash indexes into the list of connection pools/nodes. All write or read request for objects with that hash value are sent to the same server. If a query does not include the hash field as a parameter, it can be sent to all servers and unioned, or it can be left to the session's default behavior.
  • @PinnedPartitioning - Pins requests to a single connection pool/node. This allows for vertical partitioning.
  • @RangePartitioning - Partitions access to a database cluster by a field value from the object, such as the object's ID, location, or tenant. Each server is assigned a range of values. All write or read requests for objects with that value are sent to the same server. If a query does not include the field as a parameter, then it can either be sent to all server's and unioned, or left to the session's default behavior.
  • @ReplicationPartitioning - Sends requests to a set of connection pools/nodes. This policy is for replicating data across a cluster of database machines. Only modification queries are replicated.
  • @RoundRobinPartitioning - Sends requests in a round-robin fashion to the set of connection pools/nodes. It is for load balancing read queries across a cluster of database machines. It requires that the full database be replicated on each machine, so it does not support partitioning. The data should either be read-only, or writes should be replicated.
  • @UnionPartitioning@ - Sends queries to all connection pools and unions the results. This is for queries or relationships that span partitions when partitioning is used, such as on a ManyToMany cross partition relationship.
  • @ValuePartitioning - Partitions access to a database cluster by a field value from the object, such as the object's location or tenant. Each value is assigned a specific server. All write or read requests for objects with that value are sent to the same server. If a query does not include the field as a parameter, then it can be sent to all servers and unioned, or it can be left to the session's default behavior.
  • @Partitioning - Partitions access to a database cluster by a custom partitioning policy. A PartitioningPolicy class must be provided and implemented.


Partitioning policies are globally-named objects in a persistence unit and are reusable across multiple descriptors or queries. This improves the usability of the configuration, specifically with JPA annotations and XML.

The persistence unit properties support adding named connection pools in addition to the existing configuration for read/write/sequence. A named connection pool must be defined for each node in the database cluster.

If a transaction modifies data from multiple partitions, JTA should be used to ensure 2-phase commit of the data. An exclusive connection can also be configured in the EntityManager to ensure only a single node is used for a single transaction.

Clustered Databases and Oracle RAC

Some databases support clustering the database across multiple machines. Oracle RAC allows for a single database to span multiple different server nodes. Oracle RAC also supports table and node partitioning of data. A database cluster allows for any of the data to be accessed from any node in the cluster. However, it is generally it is more efficient to partition the data access to specific nodes, to reduce cross node communication.

EclipseLink partitioning can be used in conjunction with a clustered database to reduce cross node communication, and improve scalability.

To use partitioning with a database cluster to following is required:

  • Partition policy should not enable replication, as database cluster makes data available to all nodes.
  • Partition policy should not use unions, as database cluster returns the complete query result from any node.
  • A DataSource and EclipseLink connection pool should be defined for each node in the cluster.
  • The application's data access and data partitioning should be designed to have each transaction only require access to a single node.
  • Usage of an exclusive connection for an EntityManager is recommended to avoid having multiple nodes in a single transaction and avoid 2-phase commit.

Data Partitioning Examples

This example partitions the Employee data by location. The two primary sites, Ottawa and Toronto are each stored on a separate database. All other locations are stored on the default database. Project is range partitioned by its ID. Each range of ID values are stored on a different database. The employee/project relationship is an example of a cross partition relationship. To allow the employees and projects to be stored on different databases a union policy is used and the join table is replicated to each database.

@Entity
@IdClass(EmployeePK.class)
@UnionPartitioning(
        name="UnionPartitioningAllNodes",
        replicateWrites=true)
@ValuePartitioning(
        name="ValuePartitioningByLOCATION",
        partitionColumn=@Column(name="LOCATION"),
        unionUnpartitionableQueries=true,
        defaultConnectionPool="default",
        partitions={
            @ValuePartition(connectionPool="node2", value="Ottawa"),
            @ValuePartition(connectionPool="node3", value="Toronto")
        })
@Partitioned("ValuePartitioningByLOCATION")
public class Employee {
    @Id
    @Column(name = "EMP_ID")
    private Integer id;
 
    @Id
    private String location;
    ...
 
    @ManyToMany(cascade = { PERSIST, MERGE })
    @Partitioned("UnionPartitioningAllNodes")
    private Collection<Project> projects;
    ...
}
@Entity
@RangePartitioning(
        name="RangePartitioningByPROJ_ID",
        partitionColumn=@Column(name="PROJ_ID"),
        partitionValueType=Integer.class,
        unionUnpartitionableQueries=true,
        partitions={
            @RangePartition(connectionPool="default", startValue="0", endValue="1000"),
            @RangePartition(connectionPool="node2", startValue="1000", endValue="2000"),
            @RangePartition(connectionPool="node3", startValue="2000")
        })
@Partitioned("RangePartitioningByPROJ_ID")
public class Project {
    @Id
    @Column(name="PROJ_ID")
    private Integer id;
    ...
}


Eclipselink-logo.gif
Version: 2.2.0 DRAFT
Other versions...