SMILA/Documentation/Xml Storage::Implementation::Berkley XML DB

XML data storage

Berkeley DB XML

Oracle Berkeley DB XML is an open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB and inherits its rich features and attributes. Like Oracle Berkeley DB, it runs in process with the application with no need for human administration. Oracle Berkeley DB XML adds a document parser, XML indexer and XQuery engine on top of Oracle Berkeley DB to enable the fastest, most efficient retrieval of data.

The attached docs from Oracle can be found [| here]

Key Features and Limitations

Replication
- single Read/Write-Master, many Read Slaves
- limited to 60/1000 replication nodes on Windows/Unix
- when Master dies then election for new master can be done automatically.

Size
- total: max. of 256 Tera Bytes in
- container size: limited by file system (http://www.oracle.com/technology/products/berkeley-db/faq/xml_faq.html#17)

XML Encodings
- ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037, IBM1047 and IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252.

see: http://xerces.apache.org/xerces-c/faq-parse.html#faq-21, http://www.oracle.com/technology/products/berkeley-db/faq/xml_faq.html#1)

Segmentation
- it is possible to group XML documents by what ever characteristic you deem right into the same container or separate ones.
- It is possible to query ofer >1 containers at the same time.

Implementation Ideas

Scenario 1: Parallel Access from diff. Clients on same Host It is possible to configure BDB such that several client processes can access the underlying data concurrently by sharing the underlying database files. This is called environment sharing. For this to work BDB needs to be configured to activate its transaction control as described in [| BerkeleyDBXML-Txn-JAVA.pdf].

However, this approach is limited to diff. clients on the same host. Placing the files on a shared recourse such a SAN, NFS is explicitly no valid solution (see http://www.oracle.com/technology/products/berkeley-db/faq/db_faq.html#30) for accessing the same data from diff. hosts.

Given the targeted environment of a distributed system the current scenarios seems an unlikely use case for EILF. The only situation where this scenario could make sense nonetheless is, if all of the following conditions are met (IMHO this is unlikely to be the case):

the pre-processing overhead of the client for storing the XML Data is relatively large
that execution time occurs before transactional synchronization
parallelization with VM threads is less efficient than with native processes.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Documentation/Xml Storage::Implementation::Berkley XML DB

Contents

XML data storage

Berkeley DB XML

Key Features and Limitations

Implementation Ideas

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Documentation/Xml Storage::Implementation::Berkley XML DB

Contents

XML data storage

Berkeley DB XML

Key Features and Limitations

Implementation Ideas