Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/XML storage"

(Xml Storage Service)
Line 13: Line 13:
 
It is suggested to publish the needed functionality as an OSGi Service with the possibility to run multiple instances which may or may not be running in the same JVM.
 
It is suggested to publish the needed functionality as an OSGi Service with the possibility to run multiple instances which may or may not be running in the same JVM.
  
== Xml Storage Service ==
+
== Xml Storage Service (XSS)==
  
 
The intended usage of the XML Storage is very much that of a service or server (eg. like a real DB Server such as MySql, Oracle, etc.) as opposed to a library type implementation. Hence the implementation shall be done as an OSGi Service that is wired up with Declarative Services.
 
The intended usage of the XML Storage is very much that of a service or server (eg. like a real DB Server such as MySql, Oracle, etc.) as opposed to a library type implementation. Hence the implementation shall be done as an OSGi Service that is wired up with Declarative Services.
Line 19: Line 19:
 
The service itself must support multiple requests at the same time and therefore needs to be multi threaded. The intention is to use a connection-type approach as is the case for SQL DBs. That entails that multiple clients may connect to the service and each client may open possibly multiple connections that are used to query/store XML documents concurrently.  
 
The service itself must support multiple requests at the same time and therefore needs to be multi threaded. The intention is to use a connection-type approach as is the case for SQL DBs. That entails that multiple clients may connect to the service and each client may open possibly multiple connections that are used to query/store XML documents concurrently.  
  
 +
An OSGi service is still run and called within the same JVM. This is in contrast to normal DB services that typically run in their own process and hence communication is done via TCP/IP, pipes etc. 
  
An OSGi service is still run and called within the same JVM. This is in contrast to normal DB services that typically run in their own process and hence communication is done via TCP/IP, pipes etc. In the end we need to be able to access the Xml Storage Service remotely as well. This shall and can be done which SCA making thus that matter transparent to the client and moving this aspect into configuration of the setup/installation.
+
In a later phase, when supporting clustering for horizontal scaling purposes, the XSS needs to hide the clustering capability from its client and manage all its aspects fully transparently, making it purely a matter of configuration.
 
+
TODO :
+
 
+
* Figure out how this is done
+
* performance hit/overhead of SCA
+
  
 
== Xml Storage Use ==
 
== Xml Storage Use ==

Revision as of 10:51, 27 October 2008

XML storage

Introduction

The main use case of the XML Store shall be to store and retrieve XML Documents as well as to obtain a set of documents by an XPath/XQuery.

Within SMILA it is used to store the XML Version of a Record object and thus is used from several components but only via the Blackboard.

It also shall serve as an infrastructure block for any component that need a private XML Storage. In this case the storage shall only be accessible to that component to avoid any conflicts.

The first API draft defines and implement the basic CRUD operations. In-place modifications of sub nodes are not yet needed (Prio 2 or 3).

It is suggested to publish the needed functionality as an OSGi Service with the possibility to run multiple instances which may or may not be running in the same JVM.

Xml Storage Service (XSS)

The intended usage of the XML Storage is very much that of a service or server (eg. like a real DB Server such as MySql, Oracle, etc.) as opposed to a library type implementation. Hence the implementation shall be done as an OSGi Service that is wired up with Declarative Services.

The service itself must support multiple requests at the same time and therefore needs to be multi threaded. The intention is to use a connection-type approach as is the case for SQL DBs. That entails that multiple clients may connect to the service and each client may open possibly multiple connections that are used to query/store XML documents concurrently.

An OSGi service is still run and called within the same JVM. This is in contrast to normal DB services that typically run in their own process and hence communication is done via TCP/IP, pipes etc.

In a later phase, when supporting clustering for horizontal scaling purposes, the XSS needs to hide the clustering capability from its client and manage all its aspects fully transparently, making it purely a matter of configuration.

Xml Storage Use

  • Retrieval of a doc my either be done by a string key or formulating an XQeury which returns a Sequence of XML Nodes (Types) and as such may return whole documents or part of a document

Because the storage scope is that of whole documents we should also work with these as a whole. Although it is possible to convert an element node that you got via XQuery into a document (involves extracting the element and all its content as text and then to parse this into a DOM ) this process is obviously lengthy/costly. As such, we should store sub sections of XML documents that we use oftain on their own (ie. w/o their parent/containing document context) as an own entity. Obviously they need to be linked (internally in the Storage API?) so, we can clean them up properly.

  • common API uses might be good to encapsulate in it's own layer, so that each client doesn't have to perform all low level functions itself
  • original documents with the EilfID
  • Store EilfID itself -> key for it's retrieval could be an md5 hash
    • Needs normative calculation
    • The md5 hash could be cached @ the key itself so it doesn't have to be cached each time
  • Other use cases/api needed?


Binary Storage

Although it is possible to save binary objects in Berkley DB XML and possibly other Xml DBs it is better to provide separate OSGi Services for these distinctly different storage types. Apart from this, according to Ralf Schuman who investigated this matter, it seems that the performance for larger binary objects is not good with BDB.


API

It shall be possible to run different instances of XML storages similar to the idea of having multiple instances of an MSSQL server running (on the same machine). Each such instance is controlled by configuration and which is identified by a name. The following items are part of the configuration:

  • Service instance name
  • segments
    • segments shall be used for grouping of XML documents
    • I'm not sure on this if we want/need to declare the possible segments in advance. it might also serve as a limitation of possible segments if clients would be allowed to create some on the fly.
  • default segment
  • implementation bundle
    • the impl. bundle dinfines which bundle implements the service interface. for now we will have only one impl. but there might bothers. I also have the idea of providing 2 impls one that is streamlined for performance on a single node installation. the other is targeted for a distributed installation where all parameters are Serizable.
    • how is that called from code? how do we need to config this OSGi like? maybe that is part of the manifest!?
  • host
    • for communication in an distributed env. we will use SCA for remoting the communication. Although this aspect is transparent we still need to tell the service instance that is just starting whether it is the real service server or just a proxy service stub routing to the server.

Example: the data shall be stored in host S hence the service instance I is running in server mode. A client that wants to use the instance I running on host C now calls a Service instance P of the same name but that is running in proxy mode, as that just remotes the communication to I.

    • IMO it is enough to declare the host name/ip of the service server... [aint working since we need SCA also to do inter VM communication on the same host. <- how to handle this? it might be that we need to resort to sockets/ports here after all....]
    • after talking to DS it might be that SCA handles this really transparently such that it created the proxy itself w/o programming intervention.
  • TBC

The implementation itself might necessitate more items to be configured. For BDB these are:

  • basically be able to set all props of these classes/objects
    • EnvironmentConfig
    • Environment
    • XmlManagerConfig
    • this could be done dynamically with a BeanHelper utility class and reflection as done in spring.
  • segments of the generic interface map to containers, hence we need to have one container config per segment. this would indicate that segments must be declared in the general config.
  • TBC

Until then this is a first draft of some most needed methods:

  • Service
    • openConnection(String connectionString) : Connection
    • closeConnection(Connection con);
    • get/setAutocommit(Boolean value);

High Level API

The high level API is contained in the Blackboard and needs not be duplicated here. As a consequence this Service just offers a low level API.sdfsdf

Performance and Scaling

At this time the focus lies on producing a working solution. With that bottlenecks can be identified and addresses such as:

  • segment the storage
  • distribute it to hardware nodes

Back to the top