EclipseLink/UserGuide/JPA/Advanced JPA Development/Data Partitioning

From Eclipsepedia

Jump to: navigation, search

EclipseLink JPA


Data Partitioning

With data partitioning, you can subdivide a database table, index or index-organized table into smaller units. That makes it possible to manage and access those objects at a finer level of granularity, thereby improving manageability, performance, and availability. For example, data partitioning facilitates load-balancing and replicating data across multiple different databases or across a database cluster.

Partitioning can be enabled on an an entity, a relationship, a query, a session unit, or a persistence unit.

Partitioning Policies

To configure data partitioning, use the @Partitioned annotation and one or more partitioning policy annotations. The annotations for defining the different kinds of policies are:

  • @HashPartitioning - Partitions access to a database cluster by the hash of a field value from the object, such as the object's location, or tenant. The hash indexes into the list of connection pools. All write or read request for objects with that hash value are sent to the server. If a query does not include the field as a parameter, it can be sent to all servers and unioned, or it can be left to the session's default behavior.
  • @Partitioning - Partitions the data for a class across multiple difference databases or across a database cluster such as Oracle RAC. Partitioning can provide improved scalability by allowing multiple database machines to service requests. This annotation configures a custom partitioning policy.
  • @RangePartitioning - Partitions access to a database cluster by a field value from the object, such as the object's ID, location, or tenant. Each server is assigned a range of values. All write or read requests for objects with that value are sent to the server. Each server is assigned a range of values. All write or read request for object's with that value are sent to the server. If a query does not include the field as a parameter, then it can either be sent to all server's and unioned, or left to the sesion's default behavior. [In Javadoc, sters (RAC). Partitioning can provide improved scalability by allowing multiple database machines to service requests. (If multiple partitions are used to process a single transaction, JTA should be used for proper XA transaction support.)
  • @ReplicationPartitioning - Sends requests to a set of connection pools. This policy is for replicating data across a cluster of database machines. Only modification queries are replicated.
  • @RoundRobinPartitioning - Sends requests in a round-robin fashion to the set of connection pools. It is for load balancing read queries across a cluster of database machines. It requires that the full database be replicated on each machine, so it does not support partitioning. The data should either be read-only, or writes should be replicated on the database.
  • @UnionPartitioning@ - Sends queries to all connection pools and unions the results. This is for queries or relationships that span partitions when partitioning is used, such as on a ManyToMany cross partition relationship.
  • @ValuePartitioning - Partitions access to a database cluster by a field value from the object, such as the object's location or tenant. Each value is assigned a specific server. All write or read requests for objects with that value are sent to the server. If a query does not include the field as a parameter, then it can be sent to all servers and unioned, or it can be left to the session's default behavior.

All policies provide an exclusive connection option. This assigns an accessor to the client session on the first query execution and uses that connection for the duration of the session. This ensures that the entire transaction stays on the same node.

Partitioning policies are globally-named objects in a persistence unit and are reusable across multiple descriptors or queries. This improves the usability of the configuration, specifically with JPA annotations and XML.

The persistence unit properties support adding named connection pools in addition to the existing configuration for read/write/sequence.

Data Affinity and Oracle RAC Support

Some cluster-enabled databases define their own DataSource implementation that is cluster-aware. Some support data affinity and integration with a data affinity service such as EclipseLink provides.

Oracle RAC is supported.

A DataSource is required for every node.

A generic DataPartitioningCallback interface is defined in EclipseLink (platform.database.partitioning) to support integration with an external DataSource data affinity. The callback is given a chance to register itself with the DataSource on connect. The PartitioningPolicys set the partition ID into the callback instead of acquiring a connection from a connection pool.

Replication is not required.

Data Partitioning Examples

@Entity
@IdClass(EmployeePK.class)
@UnionPartitioning(
        name="UnionPartitioningAllNodes",
        replicateWrites=true)
@ValuePartitioning(
        name="ValuePartitioningByLOCATION",
        partitionColumn=@Column(name="LOCATION"),
        unionUnpartitionableQueries=true,
        defaultConnectionPool="default",
        partitions={
            @ValuePartition(connectionPool="node2", value="Ottawa"),
            @ValuePartition(connectionPool="node3", value="Toronto")
        })
@Partitioned("ValuePartitioningByLOCATION")
public class Employee {
    @Id
    @Column(name = "EMP_ID")
    private Integer id;
 
    @Id
    private String location;
    ...
 
    @ManyToMany(cascade = { PERSIST, MERGE })
    @Partitioned("UnionPartitioningAllNodes")
    private Collection<Project> projects;
    ...
}
@Entity
@RangePartitioning(
        name="RangePartitioningByPROJ_ID",
        partitionColumn=@Column(name="PROJ_ID"),
        partitionValueType=Integer.class,
        unionUnpartitionableQueries=true,
        partitions={
            @RangePartition(connectionPool="default", startValue="0", endValue="1000"),
            @RangePartition(connectionPool="node2", startValue="1000", endValue="2000"),
            @RangePartition(connectionPool="node3", startValue="2000")
        })
@Partitioned("RangePartitioningByPROJ_ID")
public class Project {
    @Id
    @Column(name="PROJ_ID")
    private Integer id;
    ...
}


Eclipselink-logo.gif
Version: 2.2.0 DRAFT
Other versions...