Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Documentation/Xml Storage::Implementation::Berkley XML DB

XML data storage

Berkeley DB XML

Oracle Berkeley DB XML is an open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB and inherits its rich features and attributes. Like Oracle Berkeley DB, it runs in process with the application with no need for human administration. Oracle Berkeley DB XML adds a document parser, XML indexer and XQuery engine on top of Oracle Berkeley DB to enable the fastest, most efficient retrieval of data.

The attached docs from Oracle can be found [| here]

Key Features and Limitations

  • Replication
    • single Read/Write-Master, many Read Slaves
    • limited to 60/1000 replication nodes on Windows/Unix
    • when Master dies then election for new master can be done automatically.
  • XML Encodings
    • ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037, IBM1047 and IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252.

see: http://xerces.apache.org/xerces-c/faq-parse.html#faq-21, http://www.oracle.com/technology/products/berkeley-db/faq/xml_faq.html#1)

  • Segmentation
    • it is possible to group XML documents by what ever characteristic you deem right into the same container or separate ones.
    • It is possible to query ofer >1 containers at the same time.

Implementation Ideas

Scenario 1: Parallel Access from diff. Clients on same Host It is possible to configure BDB such that several client processes can access the underlying data concurrently by sharing the underlying database files. This is called environment sharing. For this to work BDB needs to be configured to activate its transaction control as described in [| BerkeleyDBXML-Txn-JAVA.pdf].

However, this approach is limited to diff. clients on the same host. Placing the files on a shared recourse such a SAN, NFS is explicitly no valid solution (see http://www.oracle.com/technology/products/berkeley-db/faq/db_faq.html#30) for accessing the same data from diff. hosts.

Given the targeted environment of a distributed system the current scenarios seems an unlikely use case for EILF. The only situation where this scenario could make sense nonetheless is, if all of the following conditions are met (IMHO this is unlikely to be the case):

  • the pre-processing overhead of the client for storing the XML Data is relatively large
  • that execution time occurs before transactional synchronization
  • parallelization with VM threads is less efficient than with native processes.

Back to the top