Difference between revisions of "SMILA/Specifications/RecordStorage"
(New page: =Description= As Berkeley XML DB will not be available in eclipse in the near future, we need an open source alternative to store record metadata. There is no requirement to use an XML da...) |
|||
Line 1: | Line 1: | ||
+ | !!! UNDER CONSTRUCTION !!! | ||
+ | |||
=Description= | =Description= | ||
As Berkeley XML DB will not be available in eclipse in the near future, we need an open source alternative to store record metadata. There is no requirement to use an XML database onyl, any storage that allows us to persist record metadata will suffice. | As Berkeley XML DB will not be available in eclipse in the near future, we need an open source alternative to store record metadata. There is no requirement to use an XML database onyl, any storage that allows us to persist record metadata will suffice. | ||
− | + | =RecordStorage Interface= | |
Here is a proposal for a RecordStorage interface. It contains only the basic functionality without any query support. | Here is a proposal for a RecordStorage interface. It contains only the basic functionality without any query support. | ||
Line 17: | Line 19: | ||
− | + | =RecordStorage based on JDBC database= | |
At the moment all access to records is based on the record ID. The record ID is the primary key when reading/writing records. It would be easily possible to store the records in a relational database, using just one table with columns for the record ID (the primary key) and a second one to store the record itself. The record could be stored as a BLOB or CLOB: | At the moment all access to records is based on the record ID. The record ID is the primary key when reading/writing records. It would be easily possible to store the records in a relational database, using just one table with columns for the record ID (the primary key) and a second one to store the record itself. The record could be stored as a BLOB or CLOB: | ||
* BLOB: the record is just serialize into a byte[] and stored as a BLOB | * BLOB: the record is just serialize into a byte[] and stored as a BLOB | ||
Line 29: | Line 31: | ||
A good choice for an open source database is Apache Derby. The Apache License 2.0 is compatible to EPL, the database has a low footprint (2MB) and can be used in process as well as in client/server network mode. It is also already commited to Orbit. For a productive environment it would be easily possible to swicth to any other JDBC database, like Oracle. | A good choice for an open source database is Apache Derby. The Apache License 2.0 is compatible to EPL, the database has a low footprint (2MB) and can be used in process as well as in client/server network mode. It is also already commited to Orbit. For a productive environment it would be easily possible to swicth to any other JDBC database, like Oracle. | ||
+ | |||
+ | = RecordStorage based on relational database using eclipseLink= | ||
+ | EclipseLink offers various options to persists Java objects. Below we go into detail about using eclipseLink with JPA (Java Persistence Api): | ||
+ | |||
+ | ==Overview on JPA== | ||
+ | A mapping of Java classes to a relational database schema is created by using annotations in java code or providing an XML configuration. The classes to be persisted (called Entities) are in general represented ny database tables, member variables as columns in those tables. There are some requirements to be met: | ||
+ | * an entity class must provide a non argument constructor (either public or protected) | ||
+ | * antity classes must be top level classes, no enums or interfaces | ||
+ | |||
+ | There exists two kinds of access types, where onyl one kind is usablee per entity: | ||
+ | *field based: direct access on member variables | ||
+ | *property based: JavaBean like access via getter- and setter-methods | ||
+ | |||
+ | An Entity must have a unique Id, this can be either | ||
+ | *a simple key (just one member variable) (@Id) | ||
+ | *a composite key using multiple member varibales. This implies the usage of an additional primary key class that contains the same member varibales (same name and type) as the entity class (@Id + @idClass) | ||
+ | * an embedded key (@EmbeddedId) | ||
+ | |||
+ | Entities can have relations to other entities or contain embedded classes. Embedded classes aree not entities themselfes (but must meet the same requirements) and do not have a unique Id. They "belong" to the entity object embedding them. Version 1.0 of the JPA specification demands only support of one level of embedded objects. If more levels are supported depends on the implementation. Collections are also not allowed as embedded classes. | ||
+ | |||
+ | For more information see [[ejb-3_0-fr-spec-persistence.pdf]]. | ||
+ | |||
+ | |||
+ | ==PoC eclipseLink JPA RecordStorage using Oracle DB and Smila datamodel== | ||
+ | |||
+ | ==PoC eclipseLink JPA RecordStorage using Oracle DB and Dao Objects for Smila datamodel== | ||
+ | |||
+ | ===Enhanced Dao classes for restricted selections of attributes=== | ||
+ | |||
+ | ==PoC eclipseLink JPA RecordStorage using Derby DB and Dao Objects for Smila Datamodel== | ||
+ | |||
+ | =Serialisation of Records= | ||
+ | |||
+ | =PoC Blackboard using RecordStorage instead of XMLStorage= | ||
+ | |||
+ | |||
+ | =Links= | ||
+ | *[[http://db.apache.org/derby/ http://db.apache.org/derby/]] (Derby homepage) | ||
+ | *[[http://wiki.eclipse.org/EclipseLink/ http://wiki.eclipse.org/EclipseLink/]] (EclipseLink homepage) | ||
+ | *[[http://jcp.org/en/jsr/detail?id=220 http://jcp.org/en/jsr/detail?id=220]] (JPA Specification) |
Revision as of 13:05, 2 February 2009
!!! UNDER CONSTRUCTION !!!
Contents
- 1 Description
- 2 RecordStorage Interface
- 3 RecordStorage based on JDBC database
- 4 RecordStorage based on relational database using eclipseLink
- 5 Serialisation of Records
- 6 PoC Blackboard using RecordStorage instead of XMLStorage
- 7 Links
Description
As Berkeley XML DB will not be available in eclipse in the near future, we need an open source alternative to store record metadata. There is no requirement to use an XML database onyl, any storage that allows us to persist record metadata will suffice.
RecordStorage Interface
Here is a proposal for a RecordStorage interface. It contains only the basic functionality without any query support.
interface RecordStorage { Record loadRecord(Id id); void storedRecord(Record record); void removeRecord(Id id); boolean existsRecord(Id id) }
RecordStorage based on JDBC database
At the moment all access to records is based on the record ID. The record ID is the primary key when reading/writing records. It would be easily possible to store the records in a relational database, using just one table with columns for the record ID (the primary key) and a second one to store the record itself. The record could be stored as a BLOB or CLOB:
- BLOB: the record is just serialize into a byte[] and stored as a BLOB
- CLOB: the record's XML representation could be stored in a CLOB. Extra method calls to parse/convert the record from/to XML needs to bee applied wheenreading/writing the records (performance impact in comparison to using a BLOB). But this would offer some options to include WHERE clauses accessing the CLOB in SQL queries
Because the String representation of IDs can be really long, an alternative could be to store a hash of the String. (This hash has to be computed whenever accessing the database) In addition one could also add another column to store the source attribute of the record ID. This would allow easy access on all records of a datasource to handle the use-case "reindexing without crawling"
For advanced use-cases (e.g. Mashup) query support is needed (compare XQJ), e.g. to select all records of a certain mime type. It would be possible to add more columns or join tables for selected record attributes. Another option is to do postprocessing of selected records, filtering those records that do not match the query filter. This is functional equal to a SQL select but of course performance is very slow.
When implementing a JDBC RecordStorage one should take care to use database neutral SQL statements, or make the statements to use configurable. A good practice could be to implement the reading/writing in DAO objects, so that database specific implementations of the DAOs could be provided to make use of special features. Most databases offer imporved support for BLOBs and CLOBs.
A good choice for an open source database is Apache Derby. The Apache License 2.0 is compatible to EPL, the database has a low footprint (2MB) and can be used in process as well as in client/server network mode. It is also already commited to Orbit. For a productive environment it would be easily possible to swicth to any other JDBC database, like Oracle.
RecordStorage based on relational database using eclipseLink
EclipseLink offers various options to persists Java objects. Below we go into detail about using eclipseLink with JPA (Java Persistence Api):
Overview on JPA
A mapping of Java classes to a relational database schema is created by using annotations in java code or providing an XML configuration. The classes to be persisted (called Entities) are in general represented ny database tables, member variables as columns in those tables. There are some requirements to be met:
- an entity class must provide a non argument constructor (either public or protected)
- antity classes must be top level classes, no enums or interfaces
There exists two kinds of access types, where onyl one kind is usablee per entity:
- field based: direct access on member variables
- property based: JavaBean like access via getter- and setter-methods
An Entity must have a unique Id, this can be either
- a simple key (just one member variable) (@Id)
- a composite key using multiple member varibales. This implies the usage of an additional primary key class that contains the same member varibales (same name and type) as the entity class (@Id + @idClass)
- an embedded key (@EmbeddedId)
Entities can have relations to other entities or contain embedded classes. Embedded classes aree not entities themselfes (but must meet the same requirements) and do not have a unique Id. They "belong" to the entity object embedding them. Version 1.0 of the JPA specification demands only support of one level of embedded objects. If more levels are supported depends on the implementation. Collections are also not allowed as embedded classes.
For more information see ejb-3_0-fr-spec-persistence.pdf.
PoC eclipseLink JPA RecordStorage using Oracle DB and Smila datamodel
PoC eclipseLink JPA RecordStorage using Oracle DB and Dao Objects for Smila datamodel
Enhanced Dao classes for restricted selections of attributes
PoC eclipseLink JPA RecordStorage using Derby DB and Dao Objects for Smila Datamodel
Serialisation of Records
PoC Blackboard using RecordStorage instead of XMLStorage
Links
- [http://db.apache.org/derby/] (Derby homepage)
- [http://wiki.eclipse.org/EclipseLink/] (EclipseLink homepage)
- [http://jcp.org/en/jsr/detail?id=220] (JPA Specification)